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PREFACE 


If the instruction of students is to keep pace with the rapid develop- 
ment of statistical science, frequent publication of books based on the 
most recent knowledge in the field is required. Therefore, the author's 
primary aim is to supply students with a book that is built on the recent 
advances in statistical theory and practice. Because they approach the 
study of statistics with different interests, aims, and backgrounds, it is 
not feasible to write one text that can meet the requirements of all 
classes of students. Whatever their approach, these students will have 
one thing in common, namely, they will need to acquire a thorough func- 
tional understanding of statistical principles to make intelligent use of 
statistics. The differences in interpretation of authors of statistical texts 
as to what functional understanding involves seem to range between the 
belief that statistical principles are working rules to be learned as quickly 
ав possible for their utilitarian value and the conviction that an advanced 
knowledge of pure mathematics is the first requisite for the exposition 
of statistical principles. 

The author does not believe that either of these points of view is best 
for most of the students who need statistical training for their work. 
The former is likely to lead to blind, rule-of-thumb application of sta- 
tistical formulas; the latter is indispensable only for those who are to 
become professional statisticians or mathematical statisticians. Neither 
Practical applications nor mathematical analysis is excluded from this 
book. In fact, problems have been used abundantly to illustrate prin- 
ciples or results. Also, a number of problems have been inserted whereby 
the student may test his understanding of the statistical theory. The 
author is convinced that the detailed working through of problems is 
fundamental to a functional understanding of statistical techniques. 
Similarly, application of the principles underlying the design of experi- 
Mental or observational projects is necessary if a thorough grasp of these 
Principles is to be secured. Experiences in application are necessary for 
the student if he is to design effective experiments of his own or to evalu- 
ate those of others. However, the problems are considered as auxiliaries 
to the study of the principles. 

Again, the mathematical analysis is not excluded because without 
Mathematics there could be no serious study of statistics. But mathe- 
maties has been viewed as the servant and not as the master. The 
question of how much knowledge of mathematies ought to be assumed 
18 difficult to answer. Not many students in the social and biological 
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sciences have a knowledge of calculus. It is the view of the author that 
the student should have at least a background in calculus to be able to 
follow the theoretical material, which cannot be advantageously treated 
without considerable use of calculus. The understanding of statistical 
as well as other scientific principles is relative, dependent upon what 
intelligence, technical background, and experience the student may have. 
Students with more knowledge of mathematics usually gain more com- 
plete understanding of the mathematical formulation underlying a par- 
ticular statistical principle. They should, therefore, have the oppor- 
tunity of utilizing their more adequate preparation. However, the 
number of students with special mathematical training is very limited. 
But the student with, for example, no calculus, may omit the few sec- 
tions of the book in which the calculus is used. Even without these 
sections, he should be able to acquire a considerable and continuous 
knowledge of the essentials of statistics from the non-technical, logical 
treatment accorded to most of the essentials in the book. 

It should also be explained where the book starts and where it ends. 
This book does not start from the very beginning of its subject. Many 
upper-class students and most graduate students have had an introduc- 
tion to statistics, usually called descriptive statistics, dealing with the 
elementary processes in the reduction of data. Such preliminary train- 
ing is assumed. If the student does not have it, the instructor may 
prefer to begin the subject by laying this elementary base himself. 
This book deals with the principal objective of statistics, which is to 
provide indispensable tools and methods for designing and executing 
experimental and other observational projects and for analyzing and 
interpreting the results. 

It would be advantageous to allot a full academic year to the objec- 
tives of statistical methods as presented here. However, adjustment 
can be made when less time is available by selecting certain portions 
regarded as most fundamental by the instructor. 

It was decided to bring the book to a close when its purpose was 
accomplished, that is, after the common principles of statistics had been 
investigated. The aim was not to present topics of interest to a few 
students only. | 

The book is based on the content of a year’s course in statistics, and 
was developed over a period of approximately ten years, primarily for 
graduate students in education and in psychology. During this time, 
content and method were continuously revised in light of experiences and 
scientific development of the subject. 

The author considers himself especially fortunate in having been a 
volunteer worker at the Galton Laboratory for a year, during which 
time he studied with Professor R. A. Fisher, foremost in laying the founda- 
tions of modern statistical methods. During this period he also profited 
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from the lectures of and conferences with Professors J. Neyman and 
Egon 8. Pearson. 

The chief sources of information and help in developing the book have 
been the many serious students whose criticisms and reactions were of 
inestimable value in attaining a clearer presentation of statistical methods. 
Most significant was the help from my very capable assistants. Among 
these should be especially mentioned Dr. Cyril Hoyt, Dr. Fei Tsao, Dr. 
Garland Kyle, and Mr. Stanley Clark, all of whom have made direct 
contributions to this work. 

I am greatly indebted to Dr. Robert W. B. Jackson of the Uni- 
versity of Toronto for his critical reading and constructive criticisms of 
the work in manuscript from which I received valuable suggestions for 
its improvement. 

I am especially grateful to the following authors and publishers for 
their kind permission to reproduce certain tables which are given in the 
Appendix: 

(1) I am indebted to Professor R. A. Fisher and Dr. F. Yates, also 
to Messrs. Oliver and Boyd Ltd., Edinburgh, for permission to reprint 
Tables No. III, Distribution of t, and Table No. IV, Distribution of x? 
from their book, Statistical Tables for Biological, Medical, and Agricultural 
Research; 

(2) Professor George W. Snedecor and The Iowa State College Press 
for permission to reproduce Table 10.7—5% and 1% Points for the Dis- 
tribution of F from Statistical Methods (Fourth Edition), 1946; 

(3) Professor Egon Pearson, Editor of Statistical Research Memoirs 
to reproduce Table IV—5 9 limits for Lı and Table V—1 % limits for L, 


Computed by P. P. N. Nayer. à 
PALMER О. JOHNSON 
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STATISTICAL METHODS 
IN RESEARCH 


CHAPTER I 
THE REALM OF STATISTICS 


Sraristics IN DAILY LIFE 


Our entrance into and departure from this world are recorded as 
statistical events. Birth and death, marriage and divorce, the school 
attendance of our children, the crops grown by farmers, the number of 
miles flown by commercial planes, the hours of our labor, the output of 
manufacturing plants, the acres of wood demanded for paper, the hours 
of sunshine, the inches of snowfall—all such events and activities are 
recorded somehow and somewhere. Myriads of such experiences and 
events affecting the daily lives of roundly two billion human beings lie 
behind the statistical data condensed in volumes, published and unpub- 
lished. In reverse, we are daily translating into their real meaning 
statistical data obtained from newspapers, radio reports, lectures, books, 
and conversations. We act in accordance with the reality implied in 
statistical data when we conserve fuel which is going to be scarce, when 
We ship wheat which will be necessary after a poor harvest in a foreign 
country, when we take precautionary measures against a disease of which 
Unusually many cases have been recorded. 

The conception of statistics as having to do with figures is the most 
Popular one, and for good reasons. The public is constantly exposed to 
Statistical data occurring in advertisements, jn arguments, and in the 
distribution of information. If something is said to have been sta- 
tistically proved, opposition is supposed to become quiescent. Every- 
Where the ordinary citizen needs some ability to distinguish between 
what is truth and what is falsehood. In a democracy he needs it most 
Where he participates in the settlement of public problems and con- 
tributes toward the growth of public opinion. Citizens not only should 
be able to look at controversial questions scientifically and dispassion- 
ately; they should also acquire the habit of doing so. Education should 
Prepare them to cope intelligently with the problems of their lives and 
times; they must learn not only to think for themselves but likewise to 
act for themselves. There is danger in the educational system of a 

€mocracy when materials and methods of instruction are not keyed to 

€ formation of the scientific attitude and to the development of the 

Ability to use the scientific method. The ability to use and scrutinize 

data, to look beneath the surface of things and to discern relations 

tween reality and given data, affords an important safeguard against 
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the dangers of omnipresent propaganda. The problem is to educate 
man so that he would rather be guided by fact than by emotion. 

There is a noticeable similarity between arithmetic and statistics with 
respect to use in daily life. Arithmetic is so woven into the fabric of our 
daily life and thought that we use it very often and almost subconsciously. 
With respect to statistics we need only to recall such phrases as “highly 
exceptional," “relatively constant," “increases the probability," and 
“оп the average." However, arithmetic is a subject taught in all 
elementary schools, whereas statistics is taught scarcely at all, although 
its content is likely no more difficult than that of arithmetic at the same 
educational level. 

The practice of applying certain statistical methods, however simple 
they may be, is a critical social need for all. We must not forget that 
even the specialist lives in general society during at least two-thirds of 
the time. For this longer period he is a layman and needs the best pos- 
sible quality of layman's understanding. Гог his guidance in the current 
of general human living, he needs statistical training. 

'The most important and undisputed use of statistics in daily life is 
connected with all the activities of political, social, and commercial 
institutions which determine the economic and cultural life of a nation. 
In the realm of policy it is the function of statistics to measure the impor- 
tance of various problems and to place them in a proper perspective. In 
many branches of government factual data already are governing policy 
to a great extent. For instance, the decision to build a number of new 
schools and to engage more teachers implies legislative measures which 
are based on statistical investigations of the school-leaving age, the 
rising birth rate, the increase of population through migration, and other 
factors. Problems in the economic, industrial, and social fields, such as 
increase or decrease of employment, shortage of houses, expansion or 
contraction of existing plants, decrease or increase of crime—these and 
thousands of others should be solved statistically before political action 
can be considered. The whole structure of the national budget depends 
on the sound appraisal of the relationship between potential sources of 
revenue and planned expenditure. Local authorities need statistical 
information for the districts they serve; national agencies need it for the 
country; the organization of the United Nations needs it for the world. 

It is essential that governmental agencies be prepared to make the 
fullest possible use of modern statistical methods. The public is entitled 
to the benefit that may be derived from the progress in research. Old 
methods are often wasteful or have been found unreliable. One should 
expect that government, the foremost user of statistics on a large scale, 
should pioneer in the application of modern statistical methods. 

The urge to apply modern statistical developments seems to be greater 
where an immediate personal advantage is involved in commercial life. 
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There is even one branch of commercial activity which owes its existence 
and all-pervasive development to statisties: insurance. In many other 
branches a combination of technical and statistical knowledge is used. 
The planning of a large factory or combine is now a part of what is known 
as "scientific management." Many firms have planning departments 
which use statistical returns and charts to a great degree. An unusual 
example is furnished by the seemingly sentimental enterprise of manu- 
facturing greeting cards, of which approximately three billion are mailed 
each year in the United States, involving an annual sale of 135 million 
dollars and postal charges of 100 million dollars. ‘Statistical planning,” 
taking place in a special department one and a half years ahead of the 
exhibition of a card in a store is the first step toward the sale. Every- 
where knowledge and experience are needed for planning production, 
distribution, and sales, although the statistical methods used are often 
not very elaborate. In administration, statistics provide measures of 
performance and efficiency. Although the data do not state the causes of * 
inefficiency, if any, and do not directly effect improvement, they are 
pointers; their value depends entirely on the use which is made of them. 

Underlying all planning is the guidance derived from statistical data 
of the past toward the goals desired for the future. An insurance com- 
pany quoting rates for an endowment life policy to mature twenty years 
hence can and must do so on the basis of an estimate of future interest 
rates and past mortality experiences. The size of a new factory is deter- 
mined partly by estimates of future demands for the products to be 
manufactured. Most goods for consumption are made or ordered long 
before they are sold. Consumers, nowadays starting their own organiza- 
tions, no less than producers and managers, are dependent, for forceful 
action, on the instruments provided by statistical methods. Thus it is 
Profitable to be able to forecast trends for all economic groups: for busi- 
hess management comprising large firms with international, long-range 
distributions as well as for the individual merchant supplying the immedi- 
ate needs of a local neighborhood. Е Т \ 

On the other side, employees everywhere are finding that it is of vital 
importance to labor and its aggregate organizations to use statistics, 
which represent tools in the formation of their organizations and programs. 

The United States excels in using methods for forecasting trends in 
every field of industry and public life. 


STATISTICS IN THE SCIENCES AND THE Arts 


Statistical devices have made their greatest advances іп the scientific 
and technical branches of industry, where enterprise and science not only 
Meet but are amalgamated. . 

Perhaps no branch of mathematical science has had a more rapid 
&rowth than has the science of statistics. In the span of the last sixty 
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or eighty years the methods of statistics and the probability calculus have 
infiltrated one branch of science after another, until they now hold a 
central position in physics, biology, meteorology, chemistry, and 
astronomy. Furthermore, statistics is also growing in significance in a 
number of other fields, such as the political and social sciences. To what 
may this remarkable growth most likely be attributed? 

The introduction of a new theoretical device into a field of knowl- 
edge may often seem incidental in that when it first becomes available, 
it is used when it appears to be of value, just as the microscope, X rays, or 
integral equations may be tried out. In the case of statistics, however, its 
introduction was not just casual. 

At first statistics was used apologetically, perhaps with the excuse 
that it was only an expedient to help overcome a temporary shortcoming, 
as in reducing large amounts of observational material in order to study 
details. Thus at first the new “weapon” was tried with the expectation 
that it could be used in the study of detail, as in the study of hereditary 
transmission in individuals from one generation to another, or, as in 
physics, to fill in the gap in knowledge in gas theory with respect to initial 
coordinates and velocities of the single atoms. 

Attitudes in scientific research shift at times, perhaps unintentionally. 
Interest in individuals shifted to the mechanism underlying the behavior 
of aggregates of individuals. It was suddenly realized that even if the 
individual case could be studied in detail, it would be necessary to follow 
up thousands of individual cases in order ultimately to integrate them all 
into one statistical enumeration. 

Charles Darwin was fully appreciative of the essential function of 
statistics in biological study. His theory depended on the law of large 
numbers. Every living species is continually producing a multitude of 
individuals. On the whole, the better-fitted ones live more abund 
and have a better chance of survival. 
of potential offspring and the enormous destruction of actual offspring 
to be inferred from it constitute the statistical mechanism operating to 
produce the very small increase in the chances of survival that a small 
favorable variation bestows. 

The change in the status of statistics as a subordinate device was 
most drastic in physics. Here it came to take the dominating role of 
defining the goals and showing the ways of reaching them. Thus the 
entire structure of science was shaken, since it rests upon the foundation 
of physics. This role of statistics has led to a new understanding of the 
essential qualities of the laws of nature, namely, the change from a 
deterministic formulation of laws underlying the occurrence of natural 
events to one in terms of statistical regularities, based—as in Darwin’s 
theory—on the law of large numbers. This transition from the interpre- 
tation of physical laws based on the notion of causality to one derived 


antly 
The large geometrical progression 
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from statistical theories is attributable largely to Boltzmann’s! interpre- 
tation of the classical law of entropy, or the second law of thermody- 
namics, as it is usually called. According to this interpretation, the 
second law rests upon statistics. Rather, it is statistics; that is, it is a 
purely statistical law. Heat flows in the direction from higher to lower 
temperature because the chance is only one in many billions that it is 
likely to do otherwise. Events go in the direction in which it is most 
probable that they will move (Ref. 5). 

Further developments, particularly the new quantum mechanics and 
Heisenberg's uncertainty principle, have revolutionized still more the 
usual conception of the older classical physies and: contributed to the 
building of the edifice of the statistical conception of nature. While 
these changes have been taking place, the physicists have developed their 
own statistical methods, particularly quantum statistics, quite apart 
from the methods of statisties in other fields. Statistical ideas are 
utilized in some modern chemical theories, such as the structural formula 
of certain organie substances like rubber and proteins, where chains of 
molecules of different, weights and lengths are postulated. For example, 
chemical changes in such substances are interpreted as alterations in the 
frequency distribution of chain length. 

The significance of the general philosophical implication of the statis- 
tical formulations relating to the construction of scientific theories can 
hardly be overrated. We are more directly concerned here, however, 
with pointing out briefly the position that statistics holds today in certain 
fields of science and in technology. Since about 1920, the statistical 
approach has been accepted and welcomed by a steadily increasing 
circle of scientific workers, until today this approach is probably one of 
the most characteristic features of modern science. 

The role of statistics in science begins with the interpretation of 
measurements. Even though the methods of the natural sciences are 
the most reliable thus far designed for finding out matters of fact, the 
conclusions drawn from them are only probable, since they are based on 
evidence formally incomplete. This fact is statistically described by the 
attachment of a coefficient of error to the measurement. | 

Take, for example, the measurement of the distance of the sun from 
the earth, or, speaking more correctly, the semimajor axis of the earth's 
orbit. This is the most important constant in astronomy, since it 
establishes the scale not only of the solar system but also of the whole 
universe. It is used in almost any calculation of distances and masses, of 
sizes and densities of planets, of their satellites, and of the stars. There- 
fore, any error in its calculation is multiplied and repeated in many 
different forms. Its importance has stimulated measurements of ever- 


1 It should be noted that the work of Willard Gibbs followed parallel lines. 
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increasing accuracy. At present the measurement is 93,005,000 + 9000 
miles (p.e.);? that is, the distance is uncertain to 1 part in 10,000. One 
hundred years ago, the uncertainty was 1 part in 20. The progress in 
the development of any science is indirectly given by the size of the errors 
in its measurements. 

Laboratory measurements,in physics and chemistry are subject to 
experimental errors. Considerable attention has been given recently to 
methods of controlling and evaluating all variables that might conceiv- 
ably influence the results. The purpose is to obtain reliable laboratory 
standards, such as those of capacity, frequency, and voltage. Particu- 
larly with the development of the sciences of biochemistry and biophysics, 
measurements are required on material essentially variable. A wide 
field is under development in which are used such statistical methods as 
sampling, followed by analyzing and testing of the experimental results 
as well as the closely related problem of appropriate experimental 
designs. These methods have increasingly important applications in 
industry. The same situation prevails also to a slight extent in engineer- 
ing—mostly in technical control and research. Engineers have devel- 
oped methods of their own for dealing with the variation in the materials 
which they use. It is likely that the use of statistical methods of treat- 
ing variations in these fields would be more efficient than the current 
use of the factors of safety. 

Statistical methods are indispensable tools of the industrialist who is 
concerned with the manufacture or purchase of presumably similar 
articles or units on a large scale. However efficient the control of pro- 
duction may be, the products are bound to vary, and it is necessary to 
check the extent of variation by some plan of routine testing. The 
conformity to the requirements of a consignment of raw or manufactured 
materials must’ be reliably established. Considerable headway has been 
made in recent years in developing efficient statistical methods and 
experimental designs for meeting requirements. The productive process 
must be in a state known as one of statistical control, the criterion for 
which is: the sequence of materials must exhibit the property of random- 
ness. These are statistical problems, for the solution of which the most 
advanced statistical methods are necessary. At times, when operations 
were found lacking statistical control, statistical analysis of the results of 
routine tests have been used successfully to locate the source of the 
unwanted variations. The application of statistical methods can protect 
the consumer against the vagaries of sampling and safeguard the producer 
from the losses incurred by chances “unjust” to him. 

Meteorology is a branch of applied physical science which has a 
statistical basis, since weather forecasting utilizes statistical principles 


? Probable error. 
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and methods. The meteorologist collects data which are relatively 
complex and which are the result of multiple factors operating together 
without control. Hence, he has to apply methods of multivariate 
analysis and also statistical methods developed for dealing with serially 
correlated data. It may be expected that, with the rapid development 
of electronic calculators, striking improvement will be made in solving 
the problem of long-range weather forecasting. The great problem in 
weather forecasting at present is the lack of means to work out all the 
mathematical variables within the period that knowledge of this kind is 
useful. If valid predictions could be made of the weather long enough 
in advance, it might even become possible to do something about the 
weather. Agriculture, shipping, air travel, and other activities would 
benefit by advanced knowledge of the weather. The savings in lives, 
crops, and money would be incalculable. M 

In practically all branches of biology, methods of statistics are used. 
Galton, influenced largely by the ideas of Darwin, made quantitative 
Studies of biological variation. Much of the recent development in the 
theory and application of statistics arose to meet the need for improved 
tools designed to handle problems in agricultural and biological research. 
There was a need in these fields not only for interpreting observational 
data but also for planning experiments efficiently. 

Genetics is a branch of biological science which seeks to explain the 
resemblances and the differences that are displayed among organisms 
related by descent. Whereas the earlier work in this field was chiefly 
descriptive and empirical, the development of theories based on Mendel’s 
discoveries has brought statistical methods to bear more and more on 
the problems. In fact, highly developed statistical methods now con- 
Stitute the basis of an important part of the subject. The once conflicting 
Sciences of biometry and genetics are now closely integrated. . 

Public health, epidemiology, and vital records are statistical in char- 
acter. The collection and analysis of large masses of data are funda- 
mental in those fields. Federal and state governments collect data for 
informative and directive purposes. The study of population changes 
is somewhat specialized; its facts are the facts of life on which scientific 
Planning for the future depends. Populations are recruited by birth 
and depleted by death.’ The balance between them and the change in 
character of the age-group patterns of the population are subjects requir- 
ing careful and critical statistical analysis. Statistical methods are 
increasing in use in research in many branches of medicine, though 
apparently the general practitioner has not been greatly affected by 
Statistical ideas. Statistical methods are also fundamental in the 
Standardization of biological extracts. In biological assays, such as in 
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? Immigration has, of course, been an important factor in the United States. 
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the calculation of the potency of penicillin, insulin, digitalis, and other 
drugs, the necessary precision could not be realized except by the use of 
modern statistical procedures. 

The psychologist, particularly in the fields of experimental and applied 
psychology, needs a working knowledge of statistical methods. In a 
quantitative inquiry into a psychological problem it is generally necessary 
to measure a limited number of cases. Іп selecting them the psychologist 
must be sure that they are effectively representative of the population 
from which they are drawn. Usually at least two samples, namely, 
experimental and control groups, are necessary in an experiment. ‘These 
must be so selected as to eliminate any bias of selection with respect to 
characteristics that are related to the investigation. In addition, the 
problems of measurement involve the determination of the reliability 
and validity of the instruments used. Finally, analyzing the experi- 
mental data and drawing conclusions that the data merit are essentially 
statistical procedures. 

Applied psychology emphasizes the importance of individual differ- 
ences; it needs to develop tests for intelligence, skills, and aptitudes of 
various kinds. The allocation of individuals to places in society for 
which they are best fitted requires tests of mental and physical traits. 
The statistical methods of multivariate analysis are essential for the 
interpretation and use of such data. The future of human civilization 
depends to a great extent on the capacity of man to understand the factors 
and forces governing or controlling his own behavior. In the solution of 
these problems statistical method is likely to play a significant role. 

Psychologists have developed from orthodox statistical methods some 
variants of their own. The methods of factor analysis, for example, are 
used to describe the human mind by means of a small number of psycho- 
logical factors. 

One of the earliest uses of the term statistics was the description, at 
first verbal and later numerical, of outstanding characteristics of a state. 
The interpretation given in the first issue of the Journal of the Royal 
Statistical Society (Ref. 8) is: “Statistics may be said . . . to be the 
ascertaining and bringing together of those facts which are calculated to 
illustrate the condition and prospects of society." Social science was 
the parent of statistical method. A characteristic of the method of the 
social scientist was the restriction of his observations to circumstances 
that were not amenable to experimentation. Hence he usually dealt 
with complex cases of multiple causation. The science of economics is 
perhaps the best example of this use of statistics. 

Tippett (Ref. 6) gives three reasons why economics is dependent on 
statistics. One reason is that economic laws, if they exist, pertain to 
mass or group phenomena. The preferences, desires, and reactions of 
millions of people are manifested in economic events. The so-called 


Cuar. 1) THE REALM OF STATISTICS 9 


“Jaw of supply and demand” applies very widely. The fundamental 
assumption underlying the existence of sciences like economics (and 
psychology) is that statistical laws are descriptive of human behavior. 
A similar assumption underlies a rational approach to business and 
political problems. The second reason for the dependence of economic 
science on statistics is that only quantitative data, that is, statistics, can 
yield laws in the scientific sense. The third reason lies in the nature of 
economie problems. Economic experiments are usually not feasible. 
Hence, if phenomena are to be observed and explained, the method of 
study is essentially statistical rather than experimental. It is not often 
possible to isolate one or a few factors for experimental study as is done 
by the experimentalist in his laboratory. 

In economics research there are three general uses of statistics: they 
may (1) serve as information culminating in hypotheses and theories, (2) 
be applied to the testing of hypotheses or theories, and (3) furnish esti- 
mates of quantities in economic analysis. 

There has not been much cooperation between theoretical economists 
and statisticians. However, the development of statistical methods 
has been notable in economics. The increasing use of such quantitative 
Concepts as prices, income, and supply and demand may mean that the 
approach of the statistician eventually will prevail over that of the 
theorist. 0 А 

We meet specific problems to which statistical analysis has been 
applied in telegraph and telephone communication, in electric-power 
distribution, in road and rail traffic, and soon. The theory of probabil- 
ity has been usefully applied in the study of the effects of chance and 
other factors in accidents. It has been noted that individuals differ in 
their proneness to suffer accidents under given conditions. 

Statistical facts and methods play 8 significant part in the develop- 
ment of sociology and education as sciences. The collection of statistics 


illustrative of the conditions of society has been mentioned as one of the 
earliest activities. Each national census depicts our industrial, economie, 
Social surveys are frequently con- 


and social status at a given time. 
ducted in different parts of the country to find out the status of unem- 
f youth, and so on. The method 


ployment, housing, the delinquency 0 ore Я 
of inquiry may be by sample, with its own special difficulties and sources 


of error. Sociology stresses the interdependence of social facts and the 
need of considering them in relation to each other. The comparative 
method used at times applies the principle of varying the circumstances 
of a phenomenon with a view to eliminating variable and unessential 
factors. Thus it aims to arrive at what is indispensable and constant. 
Its primary purpose is to make provision for classification of forms of 
Social relationships to facilitate causal analysis. Statistical EE я 
tions of crime, of the causes of suicide, and of the conditions under which 
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certain economic organizations arise illustrate how the comparative 
method has been applied. 

Educational statistics collected by Federal, state, and local authorities 
provide more and more the basis for educational policies and programs. 
Subjects illustrative of the amenability of educational problems to 
scientific study are: changes in the school population with respect to 
age, intelligence, and other characteristics; means of providing equality 
of educational opportunities; the location of youth with special talents. 
Likewise, numerous studies employing the experimental method, par- 
ticularly those applying modern principles of experimental design, are 
` adding genuine knowledge concerning the educational process. 

National opinion polls, such as those of Gallup, Crossley, and Fortune 
magazine, use systematic methods of sampling. The development of this 
means of measuring public opinion is likely to play a significant part in 
the theory and practice of democratic government. , 

Statistics is beginning to find application even in such nonscientific 
fields as the arts. In a task comparable to that of the telephone engineer 
who tabulated the frequency of principal words in order to secure the 
best possible transmission, a literary scholar has tabulated the six 
thousand most common words in English, French, German, and Spanish. 
Some points of disputed authorship have been decided by the statistical 
study of the length of sentences. The frequency with which colors and 
sound patterns occur in poetry, the number of types of imagery used by 
Shakespeare, the number of different word classes characteristic of prose 
and poetry of certain periods—all these are illustrations of statistical 
applications. Evidence of errors in the chronology of early Roman 
history has been revealed by certain life tables. The authenticity of 
paintings has been established by means of the frequency of brush marks. 

The work of the mathematical statistician is fundamental in the 
development of statistical science. Here, as in other fields of science, 
basic research contributes general knowledge which affords the means 
of solving a large number of significant practical problems, although a 
specific solution may not be provided to any one problem. The role of 
applied research is to discover complete solutions to specific problems. 
The new knowledge provided by basic research furnishes scientific 
capital, from which source practical applications must be obtained. 
Most of the mathematical theory of statistics in its present character is 
the result of research of recent decades. Perhaps in no field of science 
have the theoretical advances been so sweeping and the practical results 
of such advances so pronounced. The reason may be that the solution 
of theoretical problems was primarily rendered indispensable by the 
urgent requirements of practical research. Furthermore, the principal 
contributors to the solution of the theoretical problems discovered the 


actual need for such solutions in their direct contact with the problems of 
practical research. 
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There is usually а gap between theoretical developments and practice 
in scientific fields, and this gap is also characteristic of statistics. The 
width of this gap varies in the several applied fields, and there is even a 
wide variation among workers within the same field with respect to the 
quality of statistical methods used. 

The rapid development of statistical science has, of course, left many 
problems unsolved, both theoretical and practical. It may be expected 
that theoretical studies of statistics will increase in the immediate future, 
leading to greater rigor of its theoretical structure. As is characteristic 
of all scientific subjects, statistical science is never finished and complete: 
it is dynamic, developing always. The result will be more and more 
rigorous methods (Ref. 2). This development is likely also to take in 
areas and fields where new types of observational data and new kinds 
of observations will be sought. Also, supplementary mathematical 
researches will be found necessary before workers in such fields can carry 
out their studies with the high standards of competence employed in 
fields where statistical methods are firmly rooted. 

Mention should be made of the mechanization of statistical caleula- 
tions. The generation of caleulators as users of logarithms and prepared 
tables of mathematical functions and other aids has been succeeded by 
one which knows only how to produce figures mechanically. Commercial 
machines for accounting and for scientific computation have done much 
to benefit business, government, and science. It is not only in removing 
the drudgery of reducing large masses of statistical data that the mecha- 
nization of statistics is important: with the development of machines 
based upon the principles of electronics rather than of the cogwheel, the 
Most complicated and advanced mathematical applications become prac- 
tically solvable for the first time. The significance of this development 
for the solution of theoretical as well as practical problems in science is 
Just beginning to be realized. The impetus given to this development 
by the exigencies of World War II was very great. No matter how 
rapidly one machine is produced, when finished it seems to be almost 
obsolete, so swift is progress. Therefore, any description of the electronic 
Calculator which is given here is likely to be soon superseded. 

. The electronic numerical integrator and computor, the Eniac, 
Invented and perfected at the Moore School of Engineering of the Uni- 
Versity of Pennsylvania, does not have a single moving mechanical part. 
Only the tiniest elements of matter—electrons—move within its 18,000 
Vacuum tubes and several miles of wiring. This amazing machine com- 
Pletes in two hours a mathematical task which 100 trained men could do 
Only in a year. Since all mathematical tasks, however abstruse or 
involved, can be reduced to basic arithmetic, if ample time is provided, 

is machine practically eliminates time to give the answers to virtually 
пу problem. That is, basically the machine does nothing more than 
Perform the fundamental arithmetic processes. This it does by the 
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generation of very precisely timed electrical impulses. These impulses 
are formed at a speed of 100,000 per second, which is equivalent to one 
operation every twentieth impulse, thus adding, for instance, at the rate 
of 5,000 per second. The Eniac has four kinds of memory. One of 
these “minds” performs the task of indicating the initial and boundary 
conditions of the problem. All problems must first be broken down into 
their essentials, which are then punched on cards. "These cards are then 
run through a machine unit known as the "reader." The reader acts 
as the translator of the mathematical language to the language of the 
machine, and vice versa. The values of certain scientific constants are 
introduced when required. The machine can handle numbers of 20 
digits. 

Machines have already been planned to solve problems running into 
400 stages, that is, machines which have a “memory” of 400 numbers. 
Such a machine could solve 100,000 different equations in approximately 
one minute. 

The illustrative rather than exhaustive review that has just been 
presented has attempted to portray the realm of statistics all the way 
from daily life through theoretical and applied science. If the purpose 
has been achieved, the all-pervasive character of statistics should be 
realized. A knowledge of statistics—at least of its logic and its depend- 
ence on the data of experience—is indispensable to everyone in the 
practical affairs of human society. Statistical science has likewise per- 
vaded both the theoretical and applied aspects of the biological, physical, 
and social sciences. In fact, every observable event in the behavior of 
man, as well as in the behavior of rocks and stars, is amenable to scientific 
treatment and correlation with other events. In this analysis, statistical 
methods have come to play a necessary part if such data are to be assayed 
with scientific precision and if the reliability of the information is to be 
determined with objective validity. 

The bricks of experience and the mortar of reason are the twin sup- 
ports upon which the indestructible foundation of science is built. The 
essence of science is the rational ordering of the facts of experience. In 
this process the data of experience are represented by concepts. The 
concepts are defined in a manner which facilitates the interpretation of 
rational relation between experiences. Although the derivation of these 
relations involves pure reasoning, statistical methods based on the theory 
of probability contribute in the drawing of inference and conclusions by 
specifying the degree of uncertainty involved. 

Statistics in all its aspects is accordingly of interest, and importance 
to a large number of classes of people. However, there are few if any 
individuals, including professional statisticians, who can be experts in 


^ See the discussion of meterology, page 7. 
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all branches of statisties, because they would then need to be expert in 
many branches of knowledge, including the foundations of statistical 
science itself as well as the many fields of application. Statistics is both 
a science and an art. Statistics is a science because its methods are 
basically systematic and of wide application. Statistics is an art because 
success in its application is dependent on the skill, special experience, and 
knowledge of the person using it in the field to which the application of 
statistical methods is made. Such qualifications are necessary because 
the data collected in any field are the manifestations of persons or things 
with which the statistician needs a first-hand acquaintance. 

It is, therefore, of importance that the author of any text in statistical 
methods make clear the purpose and scope of his book. 


Sraristics IN Тнів Book 
ar notion of the function of a statistician 


is the collecting, tabulating, and describing of long records of figures. 
These records are conveniently summarized by the calculation of aver- 
ages, percentages, index numbers, and other descriptive measures, and 
by the construction of one or more of the kinds of tables, graphs, dia- 
grams, or charts. This process of reducing data to certain summary 
values has been greatly aided by high-speed machines for tabulation and 
calculation. The collection of experimental and other observational 
data is, of course, an indispensable part of the scientist's work. How- 
ever, the function of a statistician, as now recognized in many branches 
of science, goes far beyond the collection and processing of numerical 
data for descriptive purposes. The less widely known activities include 
his contributions to advances in mathematical statistics basic to the 
Creation of tools of scientifie value. These tools give precision to testis of 
Scientific hypothesis. They also indicate how observational studies 
Including experiments must be planned, whether under laboratory, 
actory, or field conditions, to provide the most reliable and valid infor- 
Mation with the least expenditure of time, energy, and money. 

The emphasis of this book is on the interpretative rather than on the 


eScriptive function of statistics. This book also aims to present the 
tistics, not as an end in itself but 


theoreti i modern sta 
Principally س‎ background for the intelligent application of 
modern statistical methods. The medium for developing an under- 
Standing of the theoretical foundation js primarily empirical and logical, 
Supplemented at times bY mathematical formulation. у The complete 
exposition of the mathematical theory of modern statistical methods is, 
OWever, beyond the scope of this volume. Such information would be 
of interest chiefly to mathematical statisticians, since a thorough under- 
Standing of the theory of modern statistical methods requires à fairly 
Advanced knowledge of pure mathematics. Until recently, the basic 


The traditional and popul 


14 THE REALM OF STATISTICS [Снар. 


researches in the mathematical theory of statistics were rather videl: 
dispersed among scientific journals, but books dealing principally wit] 
the mathematical theoretical foundation of modern statisties are now 
available (Refs. 1, 3, 4, and 7). Thus, although from the mathematica 
standpoint this book is not self-contained, it is written for readers withou 
specialized mathematical training. 

The theoretical presentation in this book has been based, as it mus 
be for present-day needs, on original and secondary sources of mathe 
‘matical statistics. It is assumed that certain aspects of this theoretica 
background must be clearly understood if statistical methods are to Lx 
put to intelligent use. One basic conception is that one must know how 
to choose the most effective statistical tool for the purpose in mind. A 
second is that one must know the basic assumptions underlying the 
statistical tool selected. A third basic conception is that one must first 
test to see if the assumptions are fulfilled by the particular situation tc 
which the tool is to be applied. By continuous emphasis in this text 
upon these requisites, it is proposed that the user of statistical methods 
will become habituated to the practice of critical examination and selec- 
tion rather than to applying statistical methods blindly or in a rule-of- 
thumb manner. 

Let us repeat: statistical method is based on the same fundamental 
ideas and processes as is the general scientific method. Thinking 
statistically is equivalent to thinking scientifically. This kinship under- 
lies the development of the principles of statistical methodology. The 
more complete understanding of scientific methods is a direct aim in the 
presentation of this text. Reasoning skepticism, scientific caution, and 
common sense are urgently needed in statistics, 

The more significant contributions to statistics since the early 1920's 
have been made in the development of the foundations for the problem of 
statistical inference. The principles of statistical inference deal with 
two chief problems: that of testing statistical hypotheses and that of 
statistical estimation, These, then, are the two fundamental statistical 
problems of the research worker. The presentation of the theoretical 
aspects of these two problems, with special emphasis on their practical 
aspects, constitutes the principal content of this book, which has been 
arranged with the view of presenting the main ideas underlying statistical 
inference in a logical developmental order leading to a functional under- 
standing of the principles. А 

The concepts underlying probability and likelihood as they are used 
in statistics are given first, since probability theory plays the primary 
role in statistical inference. The fundamental theorems of direct prob- 
ability follow. We proceed with other theorems which, in turn, lead to 
the classical binomial, normal, and Poisson distributions. 

We then discuss the development of sampling theory and its use in 
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problems of statistical inference. The selection of representative samples 
receives considerable emphasis in keeping with the requirements of 
present-day research. 

This background should prepare the student to understand the testing 
of statistical hypotheses. Many illustrations of current procedures of 
testing statistical hypotheses are presented. The problem of estimating 
parameters from sample values is then treated. The following are 
considered: the properties of “best” estimates; the form of the frequency 
distribution of observational values in relation to the most accurate 
estimates; and two methods of forming estimates—the method of maxi- 
mum likelihood and the method of interval estimation. Many original 
practical problems are worked out by way of illustration. 

The interpretative function in statistical analysis has been mentioned 
as one of major concern. The fact is, however, that the interpretation 
til i body of date төсінен knowledge of how it was obtained. It is of 
equal importance that conclusions drawn from observational results be 
based on detailed knowledge of the procedures employed in the investiga- 
tion. Thus, the major function of a statistician is to design experiments 
and to plan investigations which will yield maximum information and 
valid conclusions. This responsibility of a statistician is stressed 


throughout. 


Considerable space has been allotted to the technique of the analysis 


of variance, the most powerful statistical tool yet devised for analyzing 
sources of vadat on. Modern experimental and sampling designs require 
this technique for the analysis of their results. Related problems such 


as those in regression are also included. | ; 2 

А етой understanding of the problems of the field in which one 

Works is essential when statistical data from this field siat simt 

ànd ana] 1. To develop statistical craftsmanship, one must acquire 
analyzed. 1o de The aim of this book is to assist 


Skill by observation and much practice. Ж ta a ; 
students and research workers who require technical aid in the design, 


А ntitative researches which may 
executi interpretation of quan ; = Я à 
bee e iv oe oss or in the field. This book is designed just as 
much s pens ү posee to become а competent critic of the research 


literaty : В " я 
ге in his field. he theory and application 
2 "m largely on the y PP. o 
The content of this text 18 based Xs А 
ОЁ those нат methods which are a ШЫН тайы 1 
м а 4 e groups of subject ч? 
ven is a aa Б The specialized me of statisties 
involve а зше of structure; father, me as aan 4 
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statistical processes illustrated in the examples. He is, therefore, invited 
to work through the numerous examples in all numerical detail, so that 
he may learn how to apply the same methods not only to the unsolved 
problems given in the text but also to those encountered in his readings 
and, above all, in his own research. 

Much care has been given to the practical arrangements of numerical 
calculations. The analysis of the results obtained from modern and 
original experimental designs has been given special attention. 
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The work of a scientist is in part practical: he designs experiments 
and makes observations. Another part of his labor is thecretical: he 
formulates conclusions from his experimental findings, compares his 
results with those of other workers, constructs à theoretical aystem so as 
to represent and order the facts of observation as accurately as possible, 
or notes their conformation to existing theory. With the aid of the theory 
he derives predictions, which he again validates by new observations. 

The Basis of Statistical Inference. In most, if not all, of these 
activities of the modern scientific worker, statistical methods play à 
Significant part. If the experiment or investigation is to lead ;o explicit, 
unequivocal, and convincing results, it must be planned so that the data 
are capable of clear-cut statistical interpretation. The testing of under- 
lying assumptions, the drawing of inferences from sample to population 
or from observation to hypothesis, and the derivation of predictions are 
all based upon intelligent statistical analysis. 

One of the most hazardous acts of the research worker is the drawing 
of inferences or conclusions from experimental data. This act is a proc- 
ess of reasoning from the part to the whole, from sample to population, 
from the particular to the general, or from effect to cause. This step 
is difficult, it seems, because the experimental results pertain to the experi- 
ment or sample, whereas the inference or conclusion refers to the popu- 
lation, of which the experiment or sample is only a very small part. 

The inferences drawn from sample to population are uncertain. 
Even so, these inferences can be rigorous, because they may be made so 
аз to include within themselves a quantitative specification of the kind 
&nd amount of uncertainty involved. Upon this achievement depends 
the validity of the process of acquiring new knowledge by observation or 
experiment. Science can progress by collecting new experiences as well 
as by the better ordering of those already possessed. . It is primarily by 
the former process that new knowledge comes into being. 

The statistician's contribution to the problem of drawing conclusions 
from experimental results consists in (a) setting up the requirements for 
the design or the logical structure of the experiment and (b) interpreting 
the data. While these two aspects of the process of adding to scientific 
knowledge are closely related, our principal concern for the present is to 


Consider the general problem of statistical inference. As has been noted 


in Chapter I, there are two chief problems of statistical inference: that 
| 17 


18 PROBABILITY AND LIKELIHOOD ІСна?. П 


of testing statistical hypotheses and that of estimation. Preliminary 
to the direct'éonsideration of these problems, it is desirable to develop 
some fundámental ideas and theorems, which have their origin in prob- 
ability theory. "The interpretation of experimental data is based on the 
application of probability theory. This theory is planned to provide the 
mathematical raódel of the empirical facts, that is, the data with which 
the statistician works. 

Setting up'a Model. In looking for a solid theoretical foundation 
upon which to build a model, the statistician must make clear just how 
far the concepts which he uses are justified and are requisite. The 
justification of the logical system he develops rests upon the demonstra- 
tion of its usefulness in describing the results of experience. The events 
and objects of the world of reality are always very complex. The 
scientifically trained mind is required to identify the characteristic or 
salient point from among the vast number present as an essential condi- 
tion from the standpoint of theory. Because the objects of the world 
of reality cannot be comprehended in a way that could lead to an exact 
theory, they are superseded by idealized conceptions which can be com- 
prehended with comparative ease. The object of creating theoretical 
models is to permit the mental reconstruction of the world of empirical 
fact. This statement is not equivalent to saying that the theory necessi- 
tates putting the empirical facts into an inflexible predetermined scheme. 
On the contrary, the theoretical system must be constructed so that the 
facts are truthfully represented. A scientific theory may be abstract 
not only in that it encloses a collection of selective facts but also in that it 
covers a set of ideal objects, such as wave function in physics and the 
plane in geometry. Yet when such theories encompass real objects 
to close approximations, they may serve a useful purpose. The statis- 
tician begins his work in developing efficient working tools for the research 
worker by building a simplified model by which he proposes to represent 
the phenomena of observation with reliability sufficient to supply useful 
results, 

Statistical Interpretation of Probability. The principal function of 
statistics is to describe certain characteristics of mass phenomena 
and repetitive events. From the theoretical point of view, unlimited 
Sequences of events or of similar observations are referred to as statistical 
unwerses or populations or collectives. Much of theoretical statistics is 
built up around the idea of an infinitely large hypothetical population of 
which the observational data make upasample. The idea of an infinite 
parent population from which samples are taken is a mathematical 
abstraction. Populations with which we deal in practice are finite. The 
ae population may be considered as a limiting case of a finite popu- 
ation when the number of individuals increases indefinitely. In experi- 


mental work, also, a hypothetical infinite population may be considered 
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as an infinite population of all experiments that might have been carried 
out under the conditions of an observed experiment. The individual 
experiment is interpreted as a random selection from the infinite popula- 
tion so defined. . А 
A population is an aggregate of individuals. The individual case 
is of interest to the statistician chiefly because it is from the collection of 
individuals that the characterization of the population becomes possible. 
Even if the interest were in the individual case, information would need 
to be collected for thousands of individuals, and perhaps no other eventual 
use of them would be made besides combining them under a single sta- 
tistical generalization. Although in this treatment the identity of any 
Particular individual is irretrievably lost in the aggregate, it does not 
follow that we cannot say anything about the individual from the 
knowledge we have of the population. Take, for instance, the frequency 
distribution of the ages of the 24,395 high-school graduates as recorded 
in Table 45, page 202. Let us take a single individual from the group or 
Population of 24,395. Even though we do not know his age, we know 
that he will be an exceptional individual with respect to age if he is 
of less than, say, sixteen years. It can be said that he will be one of 
84/24,395ths of the group. He will, of course, more likely be one of the 
12,148/24,395ths of the group. It is more convenient, when dealing with 
Problems of this type, to use a term commonly called odds or probability. 
In the illustration just cited, it can be said that the odds or probability of 
any one individual’s being less than sixteen years at the date of graduation 
from high school is 84/24,395 = .0034, and the probability of his age being 
eighteen is 12,148/24,395 = .498. This interpretation of probability 
's the one usually accepted in modern statistics; that is, probability is the 
Tatio of frequencies. As in this illustration, so in any frequency distri- 
bution: statistical probability may be considered as the means by which 
the characteristics of the whole distribution may be ascribed to the 
random individual. he 
The long-standing controversy over the nature and meaning of 
Probability need not detain us here. We may merely mention that the 
Psychological and subjective interpretation should be kept distinct from 
€ objective or operational interpretation of relative frequencies. 


TObability is associated with our subjective sense of expectancy just as 
4th our subjective sense of heat and 


ies from given data on the basis of 
obabilities is objective 
table to most modern 


Two definitions of probability may be cited here. (1) Von Mises 
Ref. 3) defines the probability of an event as the limit of the relative 
frequency of this event in an infinite sequence of trials, the Kollektiv, 
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fulfilling certain specified conditions. In a purely mathematical sense, 
the existence of this limit is assumed to be axiomatic. (2) Kolmogoroff 
(Ref. 2) gives the most comprehensive discussion of probability from the 
standpoint of measure. He defines probability as a set function which 
fulfills a certain system of axioms. This theory starts with the concept 
of the frequency ratio but does not postulate that definite limits of 
frequency ratios exist. It builds around the concept of a random vari- 
able, that is, by considering the probability of an event as a number 
connected with the event. The axioms postulated in the theory express 
the principles for operating with the numbers. With respect to applica- 
tion, the two theories are largely equivalent. However, the limiting 
properties of frequencies involved in definition (1), rather than the pure 
mathematics of abstract ensembles occurring in definition (2), will be 
accepted as the basis of the frequency theory of probability insofar as it 
is used in our present discussion. 

Thus, the true probability, Pi», of getting a double 6, or sum 12, in 
one throw of two dice is defined as lim = assuming that the limit exists, 


where 7» is the number of times a score of 12 is obtained in n throws 
of the two dice. Similarly, probability values can be determined for 
each of the other possible totals. On a priori grounds, a tentative or 
hypothetical probability could be assigned to the true probability. 
However, probability in the sense used here in statistics depends for its 
meaning on aggregates of phenomena or repeated events. Although 
the value of P12, for instance, can never be reached in practice, it can be 
attained within an arbitrary degree of certainty by making n sufficiently 
large. According to a theorem by James Bernoulli, the probability that 


the relative frequency E will be adjacent to Pi: is arbitrarily near to 1 


for a sufficiently long sequence of trials. 

ExawPLE 1. An Experiment in Probability. We shall illustrate 
some of the main points in probability theory by considering an experi- 
ment consisting of the throws of a pair of dice. This experiment was 
repeated а large number of times. The sequence of throws of the pair 
of dice gives rise to a Sequence of numbers, the variable consisting of the 
sums of the several combinations of the two sets of dots on the two upper 
faces of the dice after each throw, that is, 2, . . . , 12. The conditions 
of each throw were kept as uniform as possible. The systematic record 
of the results of sequences of this kind constitutes a set of statistical data 
relative to the events observed. Six sets of data, resulting from 36, 360, 
3,600, 36,000, 180,000, and 360,000 throws, are recorded in Table 1. 
The data are arranged in frequency distributions which show the number 
and per cent of occurrences for each of eleven possible events, 2, . . . , 12. 
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The hypothetical or theoretical distribution arrived at on a priori grounds 
is also recorded in the first main column. 

The probability of getting a 12 in the throw of two dice shows some 
fluctuation in the six series of experiments, ranging in value from .028 to 
1039. A similar situation holds for each of the other totals. The true 


probability of getting a 12, lim яи, though never reached in practice, сап 


be approached closer and closer by increasing the size of n. On this 
basis, the value .029 determined by 360,000 throws would give the best, 
approximation; likewise for the other totals. In this way probability 
statements are based on the results of empirical investigations. 

The erratic or haphazard behavior of the fluctuations of the variable 
from throw to throw is usually spoken of as randomness. Even with the 
utmost care in keeping all relevant factors under control, the results 
vary from observation to observation in such an irregular way that exact 
prediction of any single event is impossible. The sequence may, there- 
fore, be called a sequence of random experiments. It is noted, however, 
that, in spite of the unpredictable behavior of individual results, the 
average results of long sequences of the random experiments exhibit a 
striking regularity; this regularity may be inferred from the similarities 
among the several percentage frequency distributions. It is this phe- 
nomenon that serves as the basis for the mathematical theory of statistics. 

The hypothetical value of probabilities may at times be very useful in 
furnishing clues to true probabilities. 

We may use the theoretical values of P to determine the mathematical 
expectation, a concept that will be encountered later in sampling theory. 
The mathematical expectation of any quantity is the sum of all the values 
it may assume multiplied by their respective probabilities: 


E(X) = PX + PiX. bo +Р,Х, = Ў PX. (2.01) 


Formula (2.01) shows that the mathematical expectation is the weighted 
arithmetic mean of a variable where the different probability values, Рг, 


provide the weights. The mathematical expectation of the throws of 
two dice is given by: 


E(X) = Gis)2 + Gis)3 + (%)4 + ()5 + 
(в)б + (497 + G8 + G9 + | (2.02) 
(46010 + (%)11 + (96012 = 7 


‘ : In the interpretation of probability statements on the 
basis of relative frequencies, the following points are essential (Ref. 4): 


Summary. 


(1) The probability of an event has meaning only when the individual 
event is an element of the specified reference class. 
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(2) The objective values involved in probability statements grow 
out of their determination through empirical investigations. 
Since probability relates to the property of an object in a specified 
reference class, a given property can be associated with various 
degrees of probability referred to different reference classes. 

The direct evidence for probability statements is statistical in 
character, since the definition of such statements is explicitly 
stated in terms of relative frequencies. There are, however, 
cases where indirect evidence provides estimates of the probabili- 
ties and validation of them: for example, when probability state- 
ments are a part of a system of statements. 

Every probability statement defined as the limit of a relative 
frequency is a hypothesis which is incapable of complete confirma- 
tion or final verification by means of the finite evidence available 
at any specified time. 

Probability statements to be used successfully for specifying the 
occurrence of designated properties in definite classes with stable 
relative frequencies are not dependent upon “deterministic” or 
“indeterministic” issues. 


Fundamental Theorems of Direct Probability. The function of the 
calculus of probability is to derive probabilities of compound events from 
Sets of initially given probabilities. Thus, in the example of dice casting, 
Біуеп above, the probability of throwing a 6 with a die is not a problem 
їп the calculus of probability; but, given this probability, the probability 
of Betting 12 in the throwing of two dice is such a problem. It should be 
recognized that the propositions asserted in the calculus are only analytic 
of the definitions and rules originally specified, as in the case of demonstra- 
tive geometry, for instance. The probability calculus thus makes 
Possible the derivation of relative frequencies with which certain events 
Occur from the initial probability statements without the specification in 

he statements of what the actual frequencies are. In thus making 

efinite the predictions which the probability statements involve, the 

Calculus enables us to make the check of statement content. In this 

Section a few of the standard rules regulating the calculus of direct 

Probabilities will be given. Most of the science of statistics is built upon 

snd explicit or implicit application of these fundamental rules (Refs. 1 
4). 


(3 


— 


(4 


2 


(5 


— 


(6 


— 


that probability is measurable on a 
y is a real number, and any two 
that is, Рі > Ps Pi = Ps, or 


ee is assumed, to begin with, th 
inuous scale. Thus, a probabilit 
measures of probabilities are comparable, 
1< Py, | 
The probability of a proposition A on data R is written 


P{A|R} 
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Thus, we may state as 


Rule 1. If R entails A, P(A|R) = 1 
+ TfR entails not—A, P(A|R) = 0 

Thus, if an event is certain to happen, its probability is 1; if it is 
certain not to happen, its probability is 0. Тһе range on the probability 
scale is from 0 to 1. Any value between these limits is, therefore, a 
positive proper fraction. 

Rule 2. If Pi, Po, ..., Р, are the probabilities of n mutually 
exclusive propositions Аі, А», . . . , А, on data В, then the probability 
that one of the propositions is true is Pı + Рз +++ + Pa. Symbolically: 


P{A or A», or + * ° A,|R] = Pil A|R} + Р.А] 
+ °۰° + PRÁÓGUE 


Thus, if one ball be drawn from a bag containing four white, five 
black, and seven red balls, since the chance of its being white is t and of 
its being black is 1%, the probability of its being either white or black 
is 1с. 

Rule 8. The probability of two propositions A and В on data R is 
the product of the probability of A given R and that of B given А and В. 
Symbolically, 

P{AB|R} = P(A|R)P(B|AR) 
More generally, 


P(AiAs * + © A4[R) = PCLG|[R)PC(As AE)PCLS| AA SR) ° ° ° 
P(A4|Aa SOR AR) 


Thus, the probability of drawing a second white ball from a bag contain- 
ing five white and four black balls, the ball first drawn being returned 
before the second drawing, is $ X $, or $. 

The probability of becoming a total orphan is the product of the 
probabilities of being bereaved of father and of mother. 


The rules for the logical sum of events (Rule 2) and for the logical 
product (Rule 3) are basic in the elementary calculus of probability. 
From them, by the application of the ordinary rules of logic and arith- 
metic, it becomes possible to derive significant consequences. One such 
derivation is Bayes’s theorem, which, from the consequences drawn from 
it, often plays a conspicuous part in treatments of the foundations of 
probability and scientific method. Symbolically, it may be stated as 

P{A|RH} « Р(АДНУР(В|4:Н) 
That is, the probability of Aj, given R and Н, is proportional to the prob- 
ability of A; given H, multiplied by the probability of R, given A; and H. 
The factor on the left, that is, P( A4I2H], is called the posterior probabil- 
ity; the first factor on the right, P(AH), the prior probability; and the 
remaining factor, P(R|A;H), the likelihood. 
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In order to make any practical use of Bayes’s theorem, it is necessary 
to decide on the values to be ascribed to the prior probabilities. Bayes 
and Laplace postulated that, in the absence of definite knowledge, the 
antecedent probabilities were assumed to be equal. This postulate has 
been relentlessly attacked, especially in recent years, by statisticians on 
the grounds of supplying by hypothesis data unavailable through empir- 
ical or, more particularly, statistical investigations. 

The Principle of Maximum Likelihood. Since in most cases it is 
Practically impossible to assign values of empirical significance to the 
priori probabilities in Bayes’s theorem, the theorem has only a limited 
use. Therefore, it plays a very minor role as a means for determining the 
Probability of a given hypothesis on the grounds of the available evidence. 

Statisticians who reject Bayes’s postulate supplant it with a different 
Principle based on the use of likelihood. That is, for any A; and H, 


P{A\RH} = P(AIDLQUAJD, 


Where the factor L(R|A;H) stands for the likelihood function. 

The principle of maximum likelihood states that, when the problem of 
choosing from a number of hypotheses, A:, arises, we are to choose the one 
(assuming it exists) that maximizes L(It|A;H). That is, we are to select 
the hypothesis which gives the maximum probability of the observed 
event. 

Other Theorems in the Calculus of Probability. The previous rules 
Boverning direct probability calculations are based on the assumption 
that the relative frequency of a proposition referred to a specified class 
of objects or events has а limit. There are other theorems in the calculus 
Which require the fulfillment of additional assumptions. One of these 
is that the condition of irregularity obtains in the reference classes. This 
Condition is known as a random character. It may be spoken of here 
as a method of selection which affords an eaual probability to certain 
Propositions and thus permits the application of the calculus probability 
a pri ori. 

The irregular Kollektiv, by which is 
Observations, is the foundation of the mat ! 2011 
Advanced by von Mises. The condition of randomness, or impossibility 
SA gambling system, which the Kollektiv must satisfy, means that if the 
relative frequency of эд particular attribute is caleulated in a subse- 
quence of the Kollektiv, selected by some method which is independent 
of the Kollektiv itself, it must tend to the same limit as it does in the 
Original Kollektiv, Randomness is fundamental in the theory of sam- 

1ng to be discussed later, since the theory deals principally with samples 
Senerated by such processes. 


dig "€ Binomial Distribution. | 
‘on of random character, the following сап 


meant an infinite sequence of 
hematical theory of probability 


For reference classes satisfying the con- 
be shown: If the probability 
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of having a specified property, say S, called "success" is p, and the prob- 
ability of not having it is т = 1 — p, then the numerical value of the 
probability that exactly ¢ elements in a set (where ¢ € n) have the 
property S while the remaining n — t elements do not have S is given by 


t=0,1,2,---,n; 
"—|5 = p(S) = constant (2.03) 
for group of trials 


n! 


E t 
Ры = qa Di? 


This important theorem is termed the binomial law. Р, is the general 
term in the binomial expansion of 


(a + р)” 


The maximum value of Р,,, where р апа n are fixed, varies with 1. 
This maximum value is given when ¢ satisfies the condition 


рт+р2>212> рп + р – 1 (2.04) 


When n is very large, the value for which ¢ gives а maximum may be 
taken as pn. This value indicates that the probability of sets with n 
successive elements which contain exactly ¢ elements with the property S 
is largest when t is approximately equal to pn, or that the proportion of 
іп a set of n elements is approximately equal to the limit of the rela- 
tive frequency of S in the Kollektiv. 

Equation (2.03) is a special case of a more general theorem dealing 
with situations in which not only two results are considered but in which 
the event may occur in k ways with probabilities pi, ps, . . . ‚рь. Then, 
for a random sample of N from a multinomial distribution, it can be shown 
that the probability P; of N giving nı of the first kind, т» of the second, 

. , ny of the last, is 


N! 


та, 7а, wy 5) P PP 117 РЁ (2:05) 


Р, = 


which is the general term in the multinomial expansion of 
(pi + pa + e 4 p2"j N-—m-nn-::::dcn 


The Poisson Distribution. An important distribution of the dis- 
continuous type which often describes the facts of observations is one 
where p, or the probability of an event, is very small, but where a large 
number of cases or trials, n, are taken so that pn is finite but small. 
The number of occurrences will be distributed in the Poisson series. 
Thus, 

p0 т © 
4-1 np remains finite = и = mean 
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It can be shown that, for the Poisson distribution, 


Mean = m 
Variance = npg — m 


The distribution is therefore determined by one parameter. 
If t =0, 1, 2, ..., the relative frequency with which the values 


occur is given by the series 


те" т?е—" me^" 
d. › pts 
1! 2! x! (2.06) 
4-0, dy 2 wes & 


This series is known as “Poisson’s limit to the binomial,” “the Poisson 
Series," or “the law of small numbers." Probability tables for the 
distribution are given by Pearson (Ref. 5). 

The Normal Distribution. Тһе binomial law basic in the theory of 
Probability is exact, but it possesses the distinct disadvantage of involv- 
Ing much labor, particularly in the computation of the factorials that 
enter in the term P, [see (2.03)] when т is large. Furthermore, it is à 
theoretical distribution of the discontinuous or the discrete form. When 
the character is continuous, as is very often the case in measurements in 
Science, a curve is essential in describing such continuous variation. 

It can be shown that by a series of approximations an analytic formula 
can be obtained from Equation (2.03) which takes the form 


j کے‎ 
Р, = — ESO where 6 = ( — np (2.07) 
ova с = утра 


and the graph of Р, as a function of à is a symmetrical, bell-shaped curve 
Variously called the normal distribution curve, the Gaussian curve, or 
he Laplacian-Gaussian error curve. Since the maximum value of the 
exponential e~, for x > 0, is unity, it is noted that the normal approxima- 
tion for the probability that ¢ will assume its most probable value is 


1 
1 ri У i ial parameters, === 
E or in terms of the binomial par ! Л o 
9= 1 — р. Itisobviousthat the normal approximation gives the closest 


fit to the binomial when p = q (see page 58). -— 
. Tn addition to the normal curve being the limiting form of the binomial 
istribution, as well as of certain other distributions, its usefulness in 
ory and practice is especially enhanced by the central limit theorem. 
Ccording to this theorem, under certain conditions the sum of n inde- 
Pendent random variables, in whatever form they may be distributed, 
tends to be distributed, when expressed in standard measure, as the 
Normal distribution when n — 9. Another important property of the 
Normal distribution is its reproductive property. For example, a linear 


‚ where 


given by 
c 
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function of variates that are normally distributed is itself normally 
distributed. 

One of the earliest applications of probability was to the systematiza- 
tion of measurements and observations in the physical sciences, particu- 
larly astronomy. Legendre in 1806 had formulated what has become 
known as the principle of least squares: When a set of empirical observa- 
tions is used to establish the constants of a mathematical function, the 
best solution is that which reduces the sums of the squares of the residual 
errors to à minimum. This principle was later placed on a definite 
mathematical and logical basis by the work of Gauss, Laplace, Maxwell, 
and others. That is, the normal curve, although previously formulated 
by de Moivre, was developed as a useful mathematical tool. Since it 
was used by Gauss to describe the distribution of “errors,” it was spoken 
of as the “normal curve of error." Тһе distribution curve is useful, how- 
ever, in many situations which have nothing to do with “errors,” as in 
the original setting in which it was used. It is useful in dealing with 
variations of different kinds, especially with experimental and other 
observational results, as in the biological sciences. 

Serious attempts were made, particularly by Quetelet, to apply the 
theory of probability to social statistics. He popularized the idea of the 
“average man” as computed from extensive statistics which he collected. 
It was through analogy of the average man to the center of gravity in 
mechanics that he assumed human actions or traits as occurring in accord- 
ance with the operation of laws giving rise to a normal distribution. 
Unfortunately, this attempt, although the influence of Quetelet soon 
became very slight, seemed to have established the use of the term 
“normal” in connection with a law of distribution presuming that 
measurements should always be expected to follow the “normal law of 
errors” as if it were a law of nature. Though later developments have 
shown that in science the normal curve gives at times a very close 
approximation to the observed facts, these instances of very close 
approximations are the exception rather than the rule. 

In dealing with the distributions of errors of measurement or observa- 
tion, the normal law of error was derived under the assumption that 
deviations from the most probable value are fortuitious, meaning that 
the forces in operation to produce them could not be resolved into more 
elemental factors. It was assumed that the deviations were as likely to 
be positive as negative, and that they varied without limit, that is, within 
the bounds of + ©. Laplace’s generalization was that the distribution 
obtained by the repetition of a great number of identical alternatives is 
represented by the function 6-2, such that the ordinates of the normal 
curve decrease on both sides of the maximum ordinate in such a way 
that their logarithms are proportional to the squares of the distances 
from the center. Extending this idea to fluctuations other than so-called 
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“error,” it may be said that, if an observation, say 2, is a resultant of the 
sum of the effects of a large number of small causes operating at random, 
and if each effect is independent of т, the obtained distribution is expected 
to be normal. 

The normal distribution holds a central place in the theory of sampling 
as well as in the theory of probability. 


PROBLEMS 
Exercises 1-9 are based on the assumption of a normally distributed 
population. 

1. What proportion of the total number of cases lies between one and 
two standard deviations (S.D.) above the mean? 

2. What is the probability of obtaining a value of the variate in ran- 
dom selection at least as large as +1.96 S.D.? 

3. What proportion of the area under the normal curve lies between 
1.27 S.D. and 1.33S.D.? liesabovel.3S.D.? liesabove —1.38.D.? 
lies below 2.1 S.D.? 

4. What is the probability that a measure will lie in the range 2.5 S.D. 
to 3.1 S.D.? 

5. What is the probability of obtaining an absolute value of т/с greater 
than 1.5? 

6. What is the relative length of the ordinate cutting off the lowest 
12.1 per cent of the area? 

T. A variate is normally distributed with mean 13.5 and S.D. 3.6. (a) 
What measures selected at random might be expected to occur in not 
more than 5 per cent of the cases? in not more than 1 per cent of the 
cases? (b) What is the probability of obtaining a valie of 15? j of 8? 

8. A variable is normally distributed with unit standard deviation. 
The probability of obtaining a value of 15 or greater from the popu- 
lation is .132. What is the mathematical expectation of the means of 


random samples? 

9. A population has a mean of 37.6. h 
values of the variate lie in the range 27.8 to 47.4. 
the variate will occur with a probability of .01 or less? 

10. Insofar as the theory of statistics is Gone, upon what does the 

concept of probability depend for its meaning: 

In nins 4". the even X takes on values 1, 2, 3, 4, 5, and 6. 

If the die is unbiased, show that E(X) is 3.5, E(X?) is 15.167, and 

the standard deviation of X is 1.708. 

12. In Aa aate of the Poisson distribution given by Bort- 
kiewiez, the records of 20 army corps over a period of 10 years 
furnish 200 observations of the number of men killed by the kick of a 
horse, If the number of deaths is denoted by the variable X, which 
takes the values 0, 1, 2, 3, and 4 with frequencies 109, 65, 22, 3, and 1, 


It is found that 95 per cent of the 
What values of 


11. 
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show that the mean is approximately equal to the variance. Find the 
theoretical frequencies. 
13. Calculate the frequency of girls in 100 families of 3 children each; 
= .49. 
14. Find the number of different committees, each of 3 persons, that can 
be selected from 5 individuals. 
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CHAPTER III 
SAMPLING DISTRIBUTIONS 


. f the tools designed by the mathematical statistician are to be used 
intelligently and efficiently by the research worker, the former cannot 
evade the responsibility of setting forth clearly and unequivocally the 
conditions under which the use of each tool is valid and efficient. Where 
the statistician has done his part, it is the responsibility of the research 
Worker to determine whether the necessary conditions obtain in his 
Particular ease. It should be pointed out that other tools are generally 
required to test whether or not these conditions hold good. The com- 
тапа of these tools is an indispensable part of the researcher's art. 
Once it has been established that the assumptions have been fulfilled, he 
can proceed with confidence in the results. 

So that the student may gain an insight into the logic and reasoning 
underlying the problems of drawing valid conclusions from experimental 
results, we present a number of commonly used models developed by the 
Statistician for such purposes. It should be emphasized that the ability 
to distinguish the specific use or uses for each of the models will go a 
.9DE way toward developing the kind of statistical craftsmanship essential 
11 the modern research worker. 

Preliminary Notions on Sampling and Inference. The material out 
2 f which the statistician constructs his model for practical use in interpret- 
Ing experimental results is discovered by noting what happens when 
Sample after sample is taken from the same population. It is noted, of 
Course, that the results usually differ from one sample to another. Since 

© method of selection is kept uniform throughout the sampling process, 
ese discrepancies can logically be assigned only to the process, because 
Clearly the population remains constant. It is proper, therefore, to 
Speak of the fluctuations from sample to sample as sampling errors. 
se sampling or chance errors, as they are sometimes called, are found 

to follow chance laws, that is, though all together they form a uniform 
‚ Tesult, the value any somal might have cannot be accurately predicted. 
© individual deviations are unanalytic; that is, the forces operating 

9 bring them about are incapable of resolution into simpler and identi- 
able Components. Out of these sampling errors the statistician makes 
his Model. Against such a standard it becomes possible to compare the 
©XPerimental results. Since it is possible to measure the amount of 

Pling error xpected in any given case, it is necessary only to 
Rote Whether A ыы results conform with the standard, 
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that is, to compare the relative magnitude of the experimental results and 
their random sampling errors. In this comparison, if we note that the 
observed results, namely, the estimate of an effect presumed to exist, 
could seldom (say once in a thousand trials, once in a hundred, or once 
in twenty) be as large or larger owing to random errors of sampling 
alone, then the effect is said to be real in the sense that it is not likely 
to be due to sampling errors alone, and the experimental results are said 
to be significant. On the other hand, if it is found that often (for instance, 
fifty times in one hundred, one time in five, or even once in ten, and so 
forth) results as large or larger could be obtained that would be attribut- 
able to random sampling errors alone, they are said to be insignificant. 
Ordinarily, the basis of determining whether results are significant or 
insignificant is as follows: 

(1) The results are said to be significant if the conclusion that they 
are would be erroneous in 1 per cent or less of the cases. 

(2) The results may be significant but further observations are neces- 
sary (that is, we suspend judgment) if the conclusion that the 
results are significant would be wrong in 5 per cent or less but 
more than 1 per cent of the cases. 

(3) The results are not significant if our conclusion that they are 
significant would be in error in more than 5 per cent of cases, 


The technical term for the process employed in examining the sig- 
nificance of experimental or observational results is “the test of sig- 
nificance.” This process will be discussed much more completely in 
Chapter IV, The Testing of Statistical Hypotheses. 

The examples of empirical sampling experiments given below illustrate 
successive stages by which the statistician builds up the statistical models 
to be used in interpreting experimental results. This method, the way 
in which the earlier statisticians worked, provides a simple way of under- 
standing quite rigorously the theoretical foundation underlying statistical 
inference. Today it is not usually necessary to do an actual experiment 
in order to construct these statistical models based on sampling errors, 
since the theory of probability enables the statistician to deduce sampling 
distributions theoretically. In fact, the theoretical deduction of the 
sampling distributions of the numerous statistical quantities now in use 
is a highly specialized branch of mathematical statistics. This deduction 
is sometimes a problem of great mathematical difficulty. Particularly 
when new types of observational data are under consideration or where 
information of new kinds is under search, the mathematical problems 
at times have proved to be so formidable that the statisticians have had 
to rely on actual sampling experience. Although the mathematical 
derivations are of fundamental importance to statistical theory and 
practice, it should be apparent that the conclusions to be drawn from such 
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ORAN models would have no justifieation beyond the fact that 
€y agree with what actually happens experimentally or would be 
rir at from these simple sampling experiments. Theory is a tool 
which is tested through application and whose usefulness is decided in 


connection with the application. 


100 Rax TABLE 2 
ANDOM SAMPLES OF 5 FOR A VARIABLE X FROM A POPULATION WITH Mean 30 AND 
Sranparp Deviation 10 


52 19 | 26 | 23 | 25 | 50 | 20 | 34 
28 | 35 | 35 | 29 | 33 | 19 | 32 | 22 
1425 | 30 | 22 | 30 | 29 | 43 | 36 

38 | 22 | 37 | 24 | 33 | 33 | 27 


mer Sampling Distribution of 
to be described deals with t 


Way j Е : 
Y in which random sampling errors 2118 
X, whos 
is 10; that 15, # 


оғ y, 
al 
ты; ық of some character, Say 
Whose standard deviation 


1 It 3 
and (o 15 Conventional to speak of the 
bols d denote them by Greek letters. 


Si 
ed for the estimates made of para 


5 
44 
27 
47 
38 


29 
30 


17 | 25 
18 | 19 
17 | 35 
22 | 28 
35 | 42 


21 | 33 
12 | 30 


the Mean. 
arithmetic mean. 


he 


true values of th 
Correspondingly, 
meters or poP 


29 | 38 
21 | 23 
40 | 34 


30 
26 
19 


18 
10 
28 
36 
27 


23 
25 
23 
38 
38 


25 
26 
31 
31 


The first sampling experi- 
To illustrate the 
] population 


e populati 
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ulation value: 
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e mean is taken as 


known to be 
= 30, с = 10. Sam- 
ion as parameters 


ers аге the sym- 
s from samples. 
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ples of 5(m = 5) were chosen at random from the population. By this 
method, 100 samples of 5 for the variable X were obtained. The indi- 
vidual values for each of the 5 members of each sample are recorded 
for the 100 samples in Table 2. Note the range in values in the respective 
samples. For example, in one sample the range in the X-values is from 
5 to 47; in another, from 25 to 33.2 

Next we computed the mean of each of the 100 samples. These 
values are recorded in Table 3. Тһе 100 means vary between 19.4 and 
40.6, and both the highest and lowest means differ from the population 
mean of 30 by 10.6. Obviously, the means are much less scattered than 
are the individual values. These fluctuations in mean values are known 
as sampling errors. The amount of sampling error in each mean is the 
difference between it and the population value, that is, 30. The biggest 
error with which any one sample of 5 estimates the sample mean is 10.6. 
The smallest error is found to be 0.2: for instance, the difference between 
29.8 and 30.0. None of the 100 estimates is without sampling error. 


TABLE 3 
Tue 100 MEAN VALUES оғ THE 100 SAMPLES or 5 RECORDED IN TABLE 2 


o bo ج‎ ооо ооо мю 


The small samples of 5 give sampling errors greater than would larger 
samples. Had we taken samples of 50, for instance, the means would 
have been less scattered, indicating smaller sampling errors. This 
tendency toward less variation among sampling means and correspond- 
ingly smaller differences between sample means and the true mean, and 
thus a smaller sampling error, would continue as the size of the sample 
became larger and larger. For example, by calculating the mean of the 
100 sample means, we obtain the mean of a single sample of 500 equal to 
29.8, a value very close to the population mean of 30.0. 

For a sample of a given size, the errors of random sampling increase 
as the variation among individuals in the population becomes greater. 


For example, the estimated mean is 2; the estimated standard deviation, s. This 
convention is followed throughout this book. 

2 Mahalanobis (see Ref. 5) and others have given tables of random samples from a 
normal distribution and have shown how to use these tables to get samples of any size 
for any mean and standard deviation. We have followed this method in several of 
the empirical sampling experiments described. 
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In fact, the sampling errors are directly proportional to the increase of 
variation in the population. As an extreme case, it is obvious that, had 
there been no variation among individuals in the population sampled 
and had they all been 30, the means irrespective of sample size would 
have been 30. Hence, there would have been no sampling errors. 

The means in Table 3 may be arranged into a frequency distribution, 
thus showing the number or frequency of means falling between limits 
as noted on the base scale (Table 4). This frequency distribution of 


= 7 


Frequency 
5 


28 30 32 34 36 38 40 42 
Volues of X 


om samples of 5 from a normal 


Figure 1. Distributi X's of 100 rand 
ae Normal curve superimposed 


upbulation with mean 30 and standard deviation 10. 

n the histogram. 
Means is presented in the form of a histogram in Fig. 1. Measures of 
Central location and variability for this frequency distribution of means, 
Ms ч ch is called the sampling distribution of means, can be calculated. 
18 to be expected that the mean of the 100 means should be the same 
88 the mean of the population being sampled. In our case, the mean 
of the distribution is found to be 29.8, which agrees closely with 30, the 
Tue mean. By increasing the size of the samples, the observed value 
Would become almost exactly 30. The standard deviation of the sam- 
Pling distribution of means gives an estimate of the size of sampling errors, 
"S summing up the information concerning the whole distribution of 
"Tors. If the standard deviation of the sampling distribution is large, 
© errors of sampling are, as 8 whole, large. Correspondingly, if the 
Standard deviation is small, the errors are small. The standard deviation 
€ frequency distribution of means in Table 4, calculated in the usual 


anner, is 4.82. 
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Ав was pointed out earlier, the statistician does not usually carry 
out this very simple and tedious sampling procedure, since application 
of the mathematical theory of probability enables him to determine 
theoretically the sampling distribution and the standard error of a 
statistic. The application of known mathematical laws gives results 
that are as accurate as a sampling experiment using millions of samples. 
Hence, the method of mathematical deduction is at the same time less 
laborious and more accurate. If the samples are all drawn from a 
normal population under ideal random sampling conditions, it is known, 
as in our case, that the sample means are normally distributed about the 
population mean with a standard deviation equal to ¢/+/n, where с 
denotes the value of the population standard deviation and л, the number 
of sampling units. Even if the variable is not normally distributed in the 
population, it is known that the distribution of totals, or of the means, 
tends toward normality as the size of the sample is increased. 


TABLE 4 
FREQUENCY DISTRIBUTION OF THE 100 MEAN VALUES FOR THE SAMPLES OF 5 GIVEN IN 
TABLE 3 AND THE TEST оғ Соорхеѕѕ оғ Fir 


‘Jase i — һу? 
Class interval fo fi fo f (fo — fi)? (fo z 1) 
41.95 to +% 0 0.377 
39.95 to 41.95 1 0:927] 6 3.77 4.9729 1.319 
37.95 to 39.95 5 2.466 
35.95 to 37.95 5 5.405 5 5.41 .1681 0.031 
33.95 to 35.95 6 9.687 6 9.69 13.6161 1.405 
31.95 to 33.95 17 14.280 17 14.28 7.3984 0.518 
29.95 to 31.95 15 17.297 15 17.30 5.2900 0.306 
27.95 to 29.95 17 17.213 17 17.21 .0441 0.003 
25.95 to 27.95 12 14.101 12 14.10 4.4100 0.313 
23.95 to 25.95 ў 9.444 7 9.44 5.9536 0.631 
21.95 to 23.95 10 5.210 10 5.21 22.9441 4.404 
19.95 to 21.95 4 2.361 
17.95 to 19.95 1 0.879 5 3.59 1.9881 0.554 

— to 17.95 0 0.353 

Total 100 100.000 100 100.00 xo? — 9.484 

9df.;P > 85 


In our case, then, for samples of 5 from the known normal population, 
we expect the sample means to be normally distributed about 30 with a 
standard deviation of ¢/\/n = (10)/4/5 = 4.472. Our empirical results, 
that is, 2 = 29.8 and sz = 4.82, seem to be in close agreement. It is 
also noted (Table 4) that the observed distribution of means seems to 
agree very well with the theoretical values educed on the above theory 
using the mean and standard deviation caleulated from the population 
values. Even with samples as small as 5, the agreement between 
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observation and expectation seems close. We need, of course, a more 
careful definition of what is meant by ‘‘seemingly close.” This is given 
by the chi-square (x?) test for goodness of fit. Referring to the x?-table 
(Table III, Appendix) with 9 degrees of freedom we note that for a 
xi = 9.484 ~ P > .35. Therefore, we conclude that we accept the 
hypothesis that our 100 mean values (X’s) are normally distributed. 

The Sampling Distribution of the Difference Between Means. Our 
second sampling experiment deals with the differences between the means 
of random samples. Here we have taken 100 random samples of size 5 
for a pair of variables, say X; and Xs, which are independent of each 
other. We know that our parent population of X is normally distributed 
With mean и = 30 and standard deviation с = 10. 

The mean-difference values of X; and Х for the 100 samples have 
been calculated and recorded in Table 5. 


TABLE 5 


Tur MEAN-DIFFERENCE VALUES OF X; AND Xs FOR THE 100 SAMPLES оғ 5 


— 4.8 6.4|—12.6| 0.6] 12.6|- 1.4] 0 4.6 |- 0.6 | 1.4 
15.0 02| 48|-84|-04|- 28| 04|-16| 0.6|- 8.4 
—-8S4|-t4-£5|—-10|—-1.0(—0.8| 14) 3.6) 5.0) $2 
0.61 osl—s0| 13.0] 40|-158|- 2.8] 10.6] 14) 8.6 
70|-22|-86| 0.2|—-11.6|-17.4|- 6.4] 04) 0.6|— 9.2 
- 0.4 4.0 62| 44| 16.4|- 3.2 | —12.8 |- 4.2 |- 2.2| 12.0 
14] 74] 02|-84| 40| 80| 94|-6.0|- 1.0 |= 6.0 
- фа | 56| 18| 54| 16|-74|-6.2|- 6.2] 7.0) 12 
12| as i30| 54| 34|-2.8| &2|- $.0| 14.2 |= 14 
1.0| —4.0 |- 6.2 |- 7.4 |= 4-6 |-17.6 3.0 2.8 8.0 0.6 
ا ااال الا | ا اال‎ 


s known that the mean-difference values, 
about a mean of 0 with standard 
lation standard deviation and n 


_ From sampling theory it i 
Q.-x 1)'s, are normally distributed 
deviation A/2c/A/n, where с is the popu 
denotes the sample size. р 

Тһе mean-difference values in Table 5 аге arranged in a frequency 
distribution in Table б. We find the mean of the mean-difference values 
to bed = 39, and the standard deviation, or standard error, of the mean 
of differences to be s; = 6.736. The corresponding parameter values are 
^ = 0 and оз = 6.325. The observed values are well within the limits 
of Sampling error. 


Again we wish to test the goodness of fit of the normal distribution. 


he theoretical frequencies (f) were calculated and are given in Тайе 6. 
i-square is the appropriate test of goodness of fit of the een 
and observed distributions. Its value 18 found to be 10.985. We 
enter the x? table (Table III, Appendix) with 9 degrees of freedom and 
хі = 10.985. The corresponding probability value is P > .27. Hence, 


3 
See page 96, 
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we may conclude that the 100 mean-difference values are normally 
distributed in accordance with sampling theory. 

The Sampling Distribution of the Variance. Our third sampling 
experiment deals with the variance. Samples of 5 were chosen at random 
from a normal population with mean и = 30 and variance о? = 100. 
The sum of the squares of the deviations of the observational values Xj; 


TABLE 6 
Frequency DISTRIBUTION or THE 100 MEAN-DiFFERENCE VALUES FOR THE SAMPLES 


or 5 Given IN TABLE 5 AND THE TEST or GOODNESS or Fir 


Class interval ‚ | Go=fo? 
p obs Ma f в | n-i | - | چ‎ 
17.95 to = 0 
15.95 to 17.95 1 
13.95 to 15.95 259 5.79 3.21 10.3041 1.780 
11.95 to 13.95 5 
9.95 to 11.95 1 
7.95 t 9.95 
ны ТО Sls | 11.55 | —3.55 | 12.6025 | 1.091 
3.95to 5.95 11 9.26 1.74 3.0276 0.327 
1.95to 3.95 6 11.31 —5.31 28.1961 2.493 
— 0.05to 1.95 19 12.41 6.59 43.4281 3.499 
— 2.05 to — 0.05 13 12.38 0.62 8844 0.031 
— 4.05 to — 2.05 12 11.19 0.81 .8472 0.076 
- ic to — 4.05 9 9.18 —0.18 .0324 0.004 
— 8.05 to — 6.05 
—10.05 to — 8.05 HL 11.33 —4.33 18.7489 1.655 
—12.05 to —10.05 1 
—14.05 to —12.05 2 
—16.05 to —14.05 156 5.60 0.40 .6600 0.029 
-18.05 to —16.05 2 
— o to —18.05 0 
Total 100 100.00 0.00 xo? = 10.985 
кее ا‎ 4. 
df. =9; .30 >P > .20 


, 


from their mean Х; for each of the 100 samples was obtained, and these 
values are recorded in Table 7. 


We have calculated the variance of each sample by dividing the sum 
of the squares of the deviations of the observation values X;; from X; by 
т = 5. These estimates are called Pearsonian. Thus, 


Y (х - XD 
"EE. i-21,---,100 
Sí, 5 t 15,5 ) (3.01) 
where X; is the mean of X for the ith sample and Ху; is the jth individual 


in the ith sample. 


The ай, were arranged into a frequency distribution and the theo- 
retical frequencies calculated. 


І 
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We found the mean and the standard deviation of the s}, to be as 
follows: d 


Sh, = 08.045, ва, = 53.279 


7 The test of the goodness of fit of the chi-square function gave a 
Xj = 11.836. Entering the table of x? (Table II, Appendix) with 6 d.f., 
the probability was found to be .10 > P > .05. It was concluded that 
the sampling distribution of the 5), follows the x? distribution. 


T TABLE 7 
HE 100 VALUES OF THE SUMS OF SQUARES BASED ON SAMPLES or 5 FROM A NORMAL 
POPULATION WITH MEAN 90 AND STANDARD Deviation 10 


112.0 53.2 85.2 318.8 235.2 
402.8 443.2 174.8 241.2 849.2 
229.2 408.8 186.0 370.8 138.8 
180.8 60.8 442.0 97.2 25.2 
240.0 66.8 444.8 202.0 284.8 
970.8 180.8 470.8 507.2 1613.2 
362.8 401.2 354.8 214.0 339.2 
393.2 437.2 339.2 738.0 466.8 
169.2 518.0 1158.8 323.2 381.2 
322.8 250.8 82.0 422.0 93.2 
181.2 584.8 168.8 176.8 218.8 
180.0 154.0 74.0 86.0 231.2 
93.2 46.8 164.8 313.2 626.0 
85.2 353.2 128.8 542.0 900.8 
354.8 130.8 234.8 57.2 187.2 
485.2 1.2 257.2 176.8 764.8 
201.2 m 632.8 340.8 400.8 
575.2 977.2 118.8 73.2” 249.2 
439.2 455.2 433.2 228.8 48.8 
534.8 390.0 111.2 303.2 63.2 


Each of the s? is an estimate of the variance, c?, of the population 

= 100) from which we were sampling. Therefore, we expect the mean 
of the 100 samples of 5 to be approximately equal to g? or 100. From 
Sampling theory it is known that the expected standard deviation of the 


Sóy's is: 
Qo „еу -. 100 S = 56.57 (3.02) 
"oy n 


ation of the 100 obtained 
th the mean and the standard devia- 


ON of the 100 sample values of 5; differ considerably from expectation. 
R ir Яше : o 
Stimates are considered biased estimates if in repeated sampling their 


ean о ; à “оп does not equal the true, or рор sonnet’ 
r mathematical expectation CO s too low. It is 1.22 standard 


Че. Our obtained mean value, Se 1 


Our calculated value of the standard devi 


Values of sf, was 53.279. Thus, bo 
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errors below the true value. Although the obtained mean is within the 
limit of random sampling fluctuations, we shall consider whether or not 
a closer agreement with expectation can be obtained. 

We shall now calculate our estimate in a different manner. Define: 


= 5 ee C 1 


ecl, vU 
НЕКЕ 


where the subscript (и) indicates the unbiased estimate. 
We calculated the 100 values of Siu They are recorded in Table 8. 


2 
Sin 4 


TABLE 8 
100 UNBIASED ESTIMATES, 8%, CALCULATED FROM THE SUMS оғ SQUARES IN TABLE 7 
28.0 45.3 13.3 | 146.2 21.3 42.2 79.7 44.2 58.8 54.7 
100.7 45.0 | 110.8 38.5 43.7 18.5 60.3 21.5 | 212.3 57.8 
57.3 23.3 | 102.2 11.7 46.5 41.2 92.7 78.3 34.7 | 156.5 
45.2 21.3 15.2 88.3 | 110.5 32.2 24.3 | 135.5 6.3 | 225.2 
60.0 88.7 | 16.7 | 182.7 | 111.2 | 58.7 | 50.5 | 14.3 | 71.2 | 46.8 
242.7 | 121.3 | 37.7 | 245.3 | 117.7 | 64.3 | 126.8 | 44.2 | 403.3 | 191.2 
90.7 50.3 | 100.3 | 117.7 88.7 | 158.2 53.5 86.7 84.8 | 100.2 
98.3 | 143.8 | 109.3 | 244.3 84.8 29.7 | 184.5 18.3 | 116.7 62.3 
42.3 | 109.8 | 129.5 | 113.8 | 289.7 | 108.3 | 80.8 | 57.2 | 95.3 12.2 
80.7 | 133.7 | 62.7 | 97.5 | 20.5 | 27.8 | 105.5 | 75.8 | 23.3] 15.8 
Define again: 

E^ = 1,555,100) (3.04) 

be = i =1, ++ + , 100) (3.05) 


where the subscript (u) again indicates the unbiased estimate. 
From Table 8 we obtain 


з = 85.92, s. = 67.04 
Theoretically, we have 


He = а? = 100 (3.06) 


о? 
КЕН | 2 2 ^ 
or, = 07 =e Nee 100 4/2 = 70.71 (3.07) 


We now observe that our calculated value, S = 85.92, is .46 standard 
error below the true value, well within the limits of random sampling 
fluctuations. 

We may now State that the usual method of calculating the estimate 
of the population variance, that is, Sj. as an estimate of c?, gives а 
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biased estimate. The explanation of the bias is that, if we use Si,» its 
m= І 
n 


mean in repeated sampling is not c? but c?, where т is the size of 


the sample. In our case: 


жа 5 d 4(в-1 
85, is an estimate of c? Bu or 
= 68.95 does not differ significantly 


T - n] 
Now, our calculated value of Sir 
s 44 standard deviation below, 


from the theoretical value 80; that is, it i 
and the difference is due to sampling error alone. 
The amount of the bias in using sip аз an estimate of c? is evidently 


1 А 
59°. The theoretical standard deviation of the distribution of sp, when 


^ is large, is c? ~/2(n — 1)/n. It may be worth noting the relative 
magnitude of the bias and sampling error. The respective values for 


Various sizes of samples are recorded in Table 9. . 
The values in columns (2) and (3) of Table 9 show that the bias is 


Substantial in comparison with random sampling error, especially for 
small samples, for instance n = 50 or less. The conclusion is that, since 
there is no justification for willfully introducing a bias, the unbiased 
estimate, s?,, should be used when estimates of the population variance 
are required as in problems of statistical inference. When mere descrip- 


tion is involved, s?,), may properly be used. 


TABLE 9 


с " 2 WITH SAMPLING Error 
OMPARISON OF THE Bras IN USING su»? AS AN ESTIMATE ОҒ 7 WR s в 


і lative amount of 
Relative amount of | Re J 
Size of le bias sampling error 
po (2) xm СТ) =D 
n n 
(1) (2) (3) 
0.71 
2 0.50 
3 0.33 i» 
5 0.20 отав 
10 0.10 oat 
20 0.05 2 
50 0.02 "eni 
100 0.01 . 


of the unbiased 


Likewiss ДЕ ! nsider the square root i 
kewise, it is customary to со tandard devia- 


Sstimate of the variance as the unbiased estimate of the s 
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T (Xs = Xt 


- E шы j 2-1,-“%;100 
шат Ме aci C Sises g (3.09) 


It may be stated, however, that since s?,, in repeated sampling equals 
o, Equation (3.09) does not imply that in repeated sampling the mean, 
(Sw) = e. Itis well known that the mean of a sum of numbers does not 
exactly equal the square root of the arithmetic mean of their squares. 
To illustrate: 


X=31+34+54+648+9) = 53 
V Mean of (X?) = үз +9 + 25 +36 + 64 + 81) = 6 


We have arranged the hundred 2,5 in Table 8 into a frequency dis- 
tribution (Table 10). Тһе theoretical values have been also calculated 
and the theoretical curve based on these has been constructed in Fig. 2. 
The test of the goodness of fit of the theoretical for the observed fre- 
quencies gives a xj value of 15.25 with P > .05. The model, in this 
case built up of the sampling errors of the variance, is known as the 
chi-square curve. This model is important in statistical theory and 
practice (see Table III, Appendix). 


TABLE 10 
FREQUENCY DISTRIBUTION OF THE 100 UNBIASED ESTIMATES 8%) OF THE POPULATION 
VARIANCE IN TABLE 8 AND THE TEST оғ GoopNEss or FIT 


Interval 5 fi Жей (fo — Л)? 


Л 


331.925 to о 
291.700 to 331.925 
237.200 to 291 m) 


7 10 -8 0.90 
194.475 to 237.200 
149.725 to 194.475 5 10 -4 2.50 
121.950 to 149.725 6 10 -4 1.60 
83.925 to 121.950 27 20 7 2.45 
54.875 to 83.925 16 20 -4 0.80 
41.225 to 54.875 14 10 4 1.60 
26.600 to 41.225 8 10 -2 0.40 
17.775 to 26.600 9 5 4 3.20 
10.725 to 17.775 
7.425 to 10:725) 8 5 3 1.80 
0.000 to 7.425 
Total 100 100 0 xo? = 15.25 


d.f. = 8; 10 >P > .05 


The Sampling Distribution of i We have now considered two princi- 
pal models, the normal and the chi-square, which the statistician has 
developed. It is to be remembered that in the development of both of 
these models it was assumed that the variance or standard deviation of 
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the population was known. 16 is not often the case, however, in experi- 
mental work that the population value is known. Furthermore, the 
experimenter is usually dealing with small samples. The construction 
of a model against which experimental results of this kind could be com- 
pared would indeed be a genuine contribution to the research worker. Let 
us trace the way in which this problem was solved. 

Since the population standard deviation is unknown, the only source 
of information concerning it is that provided by the sample. It was 
observed (see Table 8) that the sample variance, and hence the standard 


15 


Frequency 


= оос соч 
Values of Siu) 

: 1 istribution curve based on the unbiased estimates 
Figure 2. The Gaisa se lom samples of 5 from a normal population with 


i ы? do: 
of the variance, s*c»'s, of 100 ran 
mean 30 and varianee 100 (Table 10). 


dapi" ifferent from the population standard deviation. 
picem ТИШЕ ке pa standard deviations was an estimate of the 
Population value. The smallest standard deviation was vea e 284 
the largest 4/403.30 or 20.08. It was essential that a model to be effec- 
tive should take these sampling fluctuations miio € RO MAS 
was done was to set up а ratio of the difference 412 5 xau TORAN 
and population mean to its estimated standar Ti ‘x ratio was 

called t. In mathematical terms we may prore as 10! ape. 3 
. ter с is unknown, though » = 30 in our 


Suppose that tho La ph notbe known). Define: 


parent population ( may 9 Ses. 1 re 100 
"um ТЖ E рые D (3.10) 


| у(х» =) а 2 Qi — X? 


Í n(n — 1) 
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where X; is the mean value of the ith sample and X;; is the jth individual 
in the ith sample. Then 


(= ? ) qoe (3.11) 


"it 


where y is the ordinate value for a specific value of {апа m is the number 
of degrees of freedom; Г denotes a Gamma function. 
In our sampling experiment we took 100 samples of size 5, 


и = 30, n = 5, m=4 
The 100 t;-values were calculated and these were recorded in Table 11. 


TABLE 11 
Tue 100 t-Vaurs ron 100 RANDOM SAMPLES OF 5 FROM A POPULATION WITH MEAN 
or 30 AND UNKNOWN VARIANCE о? 


—2.9580 -5.1504 1.7442 -0.0501 -0.1166 
1.5152 -0.2974 —2.0972 0.1728 —0.9822 
—0.7680 —0.0442 —0.6558 0.2787 0.6833 
0.9313 0.4588 0.4254 —1.6330 0.3563 
2.5981 —1.5321 -1.8147 —0.9440 —0.4770 
0.3158 1.1654 1.1954 —0.8737 0.2672 
0.4226 0.1340 0.6648 —0.6114 —1.8454 
—0.9923 1.6255 —1.0684 —0.9877 1.8215 
—2.2691 0.3930 0.2890 0.3483 1.7408 
0.4480 —0.5083 —1.9755 1.0885 0.6485 
2.8572 0.8877 0.4131 1.2781 —2.0559 
—1.0000 —0.3604 2.5994 0.9645 0.4706 
—0.7412 1.4382 —0.2787 1.1624 0.1787 
—1.1628 2.5224 —0.8669 1.5368 —0.4172 
0.5223 —0.0331 —2.3932 —2.1287 —2.7456 
0.9339 —1.0565 —2.0635 —0.2691 1.2614 
0.7567 —0.2473 1.3867 0.9126 —1.3850 
—0.6340 0.2003 --8.8645 -- 2.8226 —0.1700 
0.3414 0.5450 —1.1603 —1.7148 —0.5121 
0.3481 —0.9058 —4.4954 2.1574 1.4626 


MM —— a nr) 
Theoretically, we have 


ш = 0 


= m m 
%= s m 1414 


and о; are the mean and standard deviation of all the possible 


where и, 
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t-values, respectively. For our 100 t-values we have 


| Again, we wish to test the goodness of fit for the /-distribution by 
using the x2-criterion. This test is given in Table 12. 
TABLE 12 


DISTRIBUTION or тик 100/-Үлілгев FROM Means or SAMPLES OF 5 AND TEST or 
Goopness оғ Fir 


ا 
Class interval of t fo ЕЙ to = Je (fo — fr)? ao‏ 
1 
to -- = 0 0.79‏ 4.005 
to 4.005 015 1.20 15.78 | -0.78 0.105‏ 3.005 
to 3.005 5 3.79‏ 2.005 
to 2.005 15 12.78 2.22 0.386‏ 1.005 
to 1.005 30 31.25 —1.25 0.050‏ 0.005 
—0.995to 0.005 26 31.36 5.86 0.916‏ 
to —0.995 12 13.00 —1.00 0.077‏ 1.995 — 
to —1.995 9 3.83‏ 2.995— 
to —2.995 1 Б ло ова 6.17 38.0689 6.530‏ 8.995— 
to —8.995 2 0.81‏ *— 
Total 100 100.00 0.00 xo? = 8.064‏ 


была د‎ А ышк حح ج‎ CDI 
df. = 5; P > 14 


table (Table III, Appendix) with xj — 8.064 and 
e find that P > .14. "Therefore, we conclude 
that our 100 (-values are distributed as the t-function. 

We have arranged the 100 i-values in a frequency distribution and 
plotted the histogram. The theoretical frequency distribution of ¢ has 
been calculated and the corresponding curve has been superimposed on 
the histogram (Fig. 3). The theoretical frequency curve of the sampling 
distribution of t is a symmetrical leptokurtie curve. "Tables (see Table II, 
Appendix) have been prepared which enable one to determine for a 
given size of sample the probability of getting a value of ¢ greater than or 
equal to +1, or the value in the sample, due to random sampling errors 
alone in repeated sampling. Against this model, when it is appropriate 
for the problem involved, the experimenter may then compare his experi- 
mental results with the view of examining their significance. 


Referring to the x? 
5 degrees of freedom, W 
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Contribution of “Student.” It is fitting here to point out the sig- 
nificance of the contribution of the writer who signed himself “Student” 
to the refinement of the classical theory of errors. First, it is usually 
held that the date of his publication (Ref. 7), 1908, is the beginning of 
modern statistical theory and practice. When Student began his work 
as one of the brewers of Guinness, Son and Company, the available 
statistical tools were postulated upon large sampling theory. In the 
course of his work it was necessary for him to draw conclusions from the 


Frequency 


$3339939297o*952232223 
Values of t 

Figure 3. Distribution of the t-values of 100 random samples of 5. "Theoretical 
eurve of the t-distribution superimposed upon the histogram. 
results of small samples which themselves furnished the only indication 
of their variability. Rigorous conclusions under such conditions became 
possible through Student's determination of the exact sampling distribu- 
tion of the statistie, thus making allowance for its sampling errors. He 
demonstrated that notwithstanding these sampling errors, which in the 
case of very small samples are large, it was possible to derive a test of 
significance both rigorous and exact. Since the number of degrees of 
freedom is one of the parameters in the equation of the sampling distribu- 
tion, the restriction previously set up, namely, that the sample must be 
“large,” was removed. 


The applicability of Student's test has, of course, been greatly 
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sees i mm research in mathematical statistics. There are two 
Mee E [v] tudent which will undoubtedly endure: (1) a “Studentized” 
€ : n is, a statistic whose sampling distribution, originally involv- 
— ard deviation of the population, is altered so that its sampling 
екі a? dion uses quantities calculated only from the sample; (2) an exact 
tat significance, that is, а test which depends on a known probability 
ribution and thus is independent of irrelevant unknown parameters. 
ж. i-Distribution of the Difference between Means. More fre- 
i han the need of comparing experimental results of a single sample 
гь ene is that of comparing the results for two independent samples, 
5. ens the difference between the means of the experimental and 
pis о groups. "Therefore, we present the results of an empirical sam- 
need шашыр dealing with the model built up of the sampling 
aan us differences between means. The samples used in this case 
imd jn cue obtained in the sampling experiment described on page 37 
x able 5, where 100 random samples of five for a pair of variables 
oli Xs, which are independent of each other, were taken. Here we 
Шер. j ed that we do not know the population standard deviation 
ани) have to estimate it from the sample. The results in this case 
е found to be described by the model t-distribution. Suppose that we 

9 not know the parameters, бі and оз though we know ш = 30 and 
#2 = 30 in our parent population (ші and иг may or may not be known). 


efine: 
т + n: — 2 


(Хы - Xa) 
1 1 


үў (Хы — Xu)? + ) (Ха = X? | үг tu 


god ets 


e ith sample; Xs: is the mean 


t 


Ww = 

ws Xu is the mean value of X; in th (са e thoi 1 
cn ж of X, in the ith sample; Xi; is the jth individual in ше it 
ma, j 8nd Xs; is the jth individual in the ith sample for Хз. en it 
JAY be shown (Ref. 1) that for samples, from a normal popula- 


10; i 
n, the distribution of t is given by 


n, and na 


m+ 1 
oe!) Lone 
2 Gj- 3.13) 
nag La 2] | 
г (3) Мт NT 


Ww 1 
here V is the ordinate value for à specific value of t and m is the number 


egy, 
Brees of freedom. Іп our case, 
-2-8 


т = т + "2 
ples of 5 for a pair 


m=5, т = 5, 
ted for the 100 sam 


T 
he t-values have been calcula 
ded in Table 13. 


of y, 
al 
ues X, and X, and are recor 
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Theoretically, we have 


ы-0 


т 
a=, -= = 1.1547 


where ш and о; are the mean and standard deviation of all the possible 
t-values, respectively. For our 100 ¢-values, we have 


Finally, we wish to test the goodness of fit for the t-distribution by 
using the x*-criterion. The test is given in Table 14. 


TABLE 13 
Tue 100 (-VanuEs ron MEAN DIFFERENCES or 100 RANDOM SAMPLES OF 5 FOR 
Pairs оғ VALUES Xi AND Ха 


—0.9851 —2.4885 1.5956 0.0000 -0.1589 
3.2678 0.6438 —0.0719 0.0591 0.0708 
—0.5728 —0.6058 —0.2118 0.2773 1.4153 
0.1244 —0.7650 0.6870 —0.3565 0.3285 
1.3229 —0.5528 —1.6920 —1.7380 0.1232 
—0.0400 1.7302 3.0993 — 2.1881 —0.2098 
0.2387 0.0299 0.7405 1.8521 —0.1802 
—0.5922 2.1985 0.2455 —0.7852 1.0640 
0.2519 2.1117 0.3732 0.8348 2.3370 
0.0969 —1.1158 —1.2291 0.3926 3.2338 
0.8221 0.0856 —0.8300 0.9167 —0.2726 
0.0358 —0.8355 —0.5859 —0.6295 —1.7950 
—0.5491 -0.2123 —0.1858 0.5357 0.6711 
0.1451 2.1691 — 4.8824 1.7070 0.4798 
-0.5952 0.0301 --2.8705 0.0983 - 1.4868 
0.4930 0.5699 —0.4897 —1.2175 1.1048 
1.8076 —0.6288 1.1622 —0.8290 —0.9685 
—0.8922 0.4934 —1.8228 -1.2267 0.2524 
0.5445 0.9909 -0.5887 —0.1298 —0.1801 
—0.5218 — 1.5868 — 2.6807 0.4127 0.1375 


Referring to the x? table (Table III, Appendix) with x = 8.817 and 
with 7 degrees of freedom, we find that P > .25. Therefore, we conclude 
that our 100 /-values are distributed as the ¢-function. т n 

The Sampling Distribution of the Correlation Coefficient. We now 
present the results of à sampling experiment which illustrate the theo- 


== 


— HÓÁÓMaÓ— ———K——M ru —— 9..." GoGo Ss Se 
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retical or statistical model built up from the sampling errors of the 
correlation coefficient in repeated random sampling from a population in 
which the true correlation is known to be zero. , 
The samples used were the ones obtained by taking 100 samples of 
5 pairs of values from a normal population in which there was no correla- 
tion at all between the variables (see page 37). 
TABLE 14 


Test or GOODNESS or Fir or THE THEORETICAL (-DISTRIBUTION FOR THE OBSERVED 
L-VaLUES OF TABLE 13 


—“ 2 
Classes fi f А-А | Gof? үзін 
3.50510 = 0 
3.005 to 3.505 3 
2.505 to 3.005 0 512 8.57 3.43 | 11.7649 1.373 
2.005 to 2.505 4 
1.50 5 
1:005 io T 5 8.66 | —8.66 | 13.3956 1.547 
0.505 to 1.005 | 11 (4.11 | Eu 9.6721 0.685 
0.005 о 0.505 | 2! 18.46 5.54 | 30.6916 1.663 
—0.495 to 0.005 15 18.58 —8.58 12.8164 0.690 
—0.995 to —0.495 | 18 14.16 3.81 | 14.7456 1.041 
—1.495 to —0.995 5 8.77 | —8.77 | 14.2129 1.621 
—1.995 to — 1.495 5 
—2.495 = 5 3 
Sie ee | Tee ажа 1.31 | 1.7161 | 0.197 
—8.495 to —2.995 1 
—* to —8.495 
Men REN | Ене 
T 
Total 100 100,00 0.00 х? = 8.817 
| ا ا ا‎ 
" df. =7;P > 25 


к 


Let us define: 


Y (Qt — Li Kas — Xo) 


Тү = j - 
x (Xi— x) (Xa; — Хә)? 
і і 


where X,; and Ж; are the means for X; and Xs, respectively, in the ith 


sample, and Xj; and Хау are the jth individual in the 7th sample for 
E ij i 


X, and Х, respectively. The r-values for the 100 samples in Table 5 


have b, lated and recorded in Table 15. л. 
от that r is distributed in repeated sampling in the 


following function: 


T Є = } n—4 
H 2 а-ә? (8.15) 
y NEN 


“ЕП 


1, «5, 100 
; з ER 
j= 1, у ете А ( 14) 
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where y is the ordinate value for a specific value of r and n is the sample 
size. In our case, n = 5. 


TABLE 15 
Tue 100 VALUES or THE CORRELATION COEFFICIENT r CALCULATED For 100 RANDOM 


SAMPLES ОЕ 5 FROM A POPULATION IN WHICH THE True CORRELATION Is ZERO 


—.829 
—.669 
—.692 
.142 
.126 


.548 
.522 
.830 
.563 
.678 


—.892 

.378 
—.625 
—.618 
—.082 


.528 
481 
— .296 
.298 
-.2%78 


Theoretically, wi 


—.954 .859 -. 
—.184 .849 
— .862 .168 - 
—.091 — .285 
— .926 .289 
— .408 —.706 - 
891 — .636 
„111 -111 
.487 .196 
.733 —.212 
.196 —.108 - 
.847 .196 
.079 —.681 - 
.549 — .804 
-.488 —.294 
368 —.716 - 
—.881 -.988 
-.276 -.485 == 
089 -.232 
.120 725 
e have 
ur = 0 
с. = і = .500 
dat. heh 


810 


.876 
.748 
.663 
.152 


.274 
.370 
.150 
.516 
.007 


124 
.640 
.589 
921 
216 


.294 


—.614 


.482 


—.726 
—.640 


—.114 


.114 
.482 
.720 


-.744 


- .690 
—.885 


.665 


— .763 


.737 


where u, and с, are the mean and the standard deviation of all the possible 


r-values, respectively. For our 100 r-values, we have 


Finally, we wish to test the goodness of fit for the r- 
using the x?-criterion.! The test is given in Table 16. 


^F. М. David, Tab 
College, London, 1938. 


distribution by 


les of the Correlation Coefficient, Biometrika Office, University 
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Referring to the x?-table (Table III, Appendix) with x? = 12.004 
and with 8 degrees of freedom, we have P > .14. Therefore, we con- 
clude that our 100 r-values are distributed as the r-function [Formula 
(3.1571. 

It is known from sampling theory that for large samples, where n is 
larger than 100, r is approximately normally distributed about zero 


(p = 0) with a standard deviation equal to In our sampling 


m= Т 
experiment it is noted that the mean of the 100 sample values of r was 
—.0596 and that the standard deviation was .5076. Even with samples 

TABLE 16 


DISTRIBUTION оғ 100 CORRELATION COEFFICIENTS AND THE TEST OF GOODNESS оғ 
FIT or THE THEORETICAL FOR OBSERVED VALUES OF r 


Class interval fo f fo — fi | (fo — fr)? oi 
-805 to 1.000 3 5.01 } =" 
oe uu dus 5-01 lis.os| —2.98 | 8.8804 | 0.635 
-405 to . 605 10 10.96 -0.96 | 0.9216 0.084 
-205 to .405 11 12.10 —1.10 1.2100 0.100 
.005 to .205 18 12.64 5.36 | 28.7296 2.273 
— .195 to .005 6 12.65 —6.65 | 44.2225 3.496 
— .895 to — .195 14 12.14 1.86 3.4596 0.285 
= .695 to — .895 8 11.03 -9.08 9.1809 0.832 
— .796 to — .596 15 9.10 5.90 | 34.8100 | 3.825 
—1.000to — .795 7 5.40 1.60 | 2.5600 | 0.474 
Total 100 100.00 0.00 xo? = 12.004 


d.f. = 8; P > 14 


ofn = 5, these values agree closely with the expected values of 0 and 
-500. At least for large samples, the normal curve might be used for the 
mathematical model when the true value of p = 0, against which the 
experimental results might be compared. An exact test, however, is 
available based on the t-distribution as outlined above. In this case, 


EL 


pes 


ie ; fags (3.16) 
This test is particularly useful for small samples. ) 
hen the correlation in the population is not zero, that is, when р # 0, 
he sampling distribution of r is distributed about p with a standard 
deviation, or standard error approximately equal to 1- p/V/n— 1. 
еп р = 0, this reduces to the standard deviation given above, or 
ісі? With large samples and moderate or small values of p the 
Sample value r may be substituted for the unavailable p, for example, 
1- т/д — 1 as a measure of the sampling error of r. With small 
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samples, however, the sample value, 7, often differs greatly from the true 
value. Furthermore, the sampling distribution departs widely from 
normality so that the test of significance based upon the formula for large 
samples may be highly misleading. The constants of distribution of r 
for samples of n = 20 from a normal population as given in Table 17 
are illustrative of this point.® 


TABLE 17 
CONSTANTS OF DISTRIBUTION OF 7 IN SAMPLES (N = 20) FROM А NORMAL POPULATION 
р 0 .2 A 6 8 .9 
Bı 0.000 0.066 0.260 . 650 1.400 2.060 
Вг 2.710 2.820 3.170 3.910 5.420 6.870 
Or 0.229 0.221 0.197 0.154 0.091 0.049 
1 — р? 
——— 0.229 0.220 0.193 0.147 0.083 0.044 
VN —1 


Fisher solved these and related problems by using the transformation 


z’ = tanh"! r 


Ir 
3 log. (1 = ) (3.17) 


2' is to a first approximation normally distributed about the population 


p А -— 1 

value £ + 2 —1) for all values of p with a standard deviation Anc 
The form of the distribution of z' is nearly independent of the value of p 
in the population. The close approximation to normality of the z'-dis- 


tribution is noted from the constants of distribution of z’ given in Table 
18. 


TABLE 18 
CONSTANTS or DISTRIBUTION OF z/ IN SAMPLES (№ = 20) FROM А NORMAL POPULATION 
P Mean (#7 — )غ‎ on В: В» 
0 .0000 .2423 .0000 3.116 
.2 .0053 .2422 -0000 3.117 
6 .0159 .2412 .0000 3.118 
9 0249 .2398 .0000 3.114 


= e 


Other Uses of Statistical Models. We have now illustrated how the 
statistician builds up statistical or mathematical models against which 
experimental results may be checked with a view to examining their 
significance. In order for the reader to gain an insight into the process, а 


5 See page 149 for criteria of normality. 
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series of empirical sampling experiments was presented. The three 
principal models illustrated were the normal, the ¢, and the chi-square. 
Certain other uses of one or another of these models and of models not 
previously illustrated for the research worker will now be considered. 

The Difference between Correlation Coefficients. The research worker is 
frequently interested in comparing the relative intensity of relationships 
for different characters. Although an exact test of significance is not 
available for such purposes, a test based on Fisher’s z'-transformation 
of the correlation coefficient is valuable and sufficiently accurate for most 
practical problems (Ref. 3). Let 


д = 4 log Ел 


= т 
1+7 
4 = $ log. EE 


where 7, and rs are two correlation coefficients calculated from random 
samples of n, and n; individuals, respectively. 


1 1 " 
7, — 24 varies normally about £C = төре 1) with standard 
1 (= 0) 
deyiiuion. Jd poi. 
eviation Js —3 + a= 
Therefore, the quantity ; 
РЧ 
X= аа (3.18) 
[I uL. 
m—3 т – 3 


may be assumed to be normally distributed about zero with a standard 
deviation of unity when the true correlation coefficients in the sampled 
Parent population are in fact equal. The sampling distribution, then, of 

in repeated sampling may be assumed to be normal, and the experi- | 
Mental result, Xo, may be compared against the normal model. o. 

The Combination of Correlation Coefficients. The z'-transformation is 
valuable for use in problems involving the averaging of several sample 
values of r from the same population in order to get the combined esti- 

Tate of p. Thus the weighted arithmetical mean is 


—3)4 + (ne = Bh + °°° si 
f= m3 ( ) 


and the standard error of Z' is 
= 1 (3.20) 
йе” СЕП СЕРЕ ° 
ей to the normal model to deter- 
t as or greater than Xo could be 
g errors alone. 


The ratio X, = #'/з may then be referr 
Tine the probability that a value as grea 5 
obtained in repeated sampling by random samplin 
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Correlations on the Same Sample. Comparisons are sometimes made 
among correlation coefficients based on the same sample. Hotelling 
(Ref. 4) has given the exact solution of the problem of testing the sig- 
nificance of the difference between ry: and тур under the conditions that 
the significance is to be interpreted with respect to subpopulations of 
possible samples for which predictors X, and X; take the same set of 
values as those found in the obtained sample. Thus, F, or the variance 
css бы ња) У — 3)0 + гы) 

p Ty — Түз)! Es T12 
d 2 — ris — ти — rie + 27и) G2 
n = 1; тп = № – 8 
where туі is the correlation coefficient between the predictor X; and the 
predictand y; ry», between X» and y; and riz is the correlation between 
X; and Хз. 

The assumption underlying the test is that (1) y has the univariate 
normal distribution for each set of values of X; and X», independently 
for the different sets with (2) a common variance c? and (3) linear regres- 
sion of y on X; and Х», respectively. 

Hotelling also developed formulas for determining the selection of (a) 
one variate from among three or more and (b) additional variates when 
some have been chosen. His principal solutions of tests of significant 
differences among тул, . . . , ry, are given in Ref. 4. А 

Fisher’s z-Distribution and the Related /-Distribution. А mathe- 
matical model which has played,an important role in modern statistical 
analysis is the z-distribution developed by Fisher. 

The quantity z is equal to one-half the difference of the natural 
logarithms of two independent estimates of the same population variance, 
or to the difference of the natural logarithms of the corresponding stand- 
ard deviations. This distribution serves as the model against which 
tests of significance of experimental results attained in the analysis of 
variance and in multiple regression problems (to be discussed later) are 
compared. 

Thus, suppose we have two samples of sizes, Ni and Ne, each drawn 
at random from one of two populations of variates normally distributed 
with equal population variances о?. 

Compute from the two samples 


Ni Na 
у (E — x у, (X; — X9* 


3 = = - and gat 
ni na 


where X, and X; are the respective means; sł and s; are the respective 
variance estimates; and nı = N; — 1, n» = Ns — 1. Then 


2 
2-і log. 5 = log. 2 (3.22) 
2 
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z is distributed in the form 


езе 


y= Vo Gee F nies) (3.23) 


where yo may be taken such that the area of the curve is unity, and the 
experimental value, zo, may be compared with the model to determine 
the probability that values of z equal to or greater than zo could be 
obtained by random sampling errors alone. The probability P will be 
given by the area under the curve to the right of the ordinate erected at zo. 

Fisher (Ref. 3) has computed tables giving values of z corresponding 
to different values of nı and ns, and P, namely, the 5, 1, and 0.10 per cent 
points of the z-distribution. It should be pointed out that the table 
gives the values of z at which ordinates cut off “tails” of 5, 1, and .10 per 
cent of the total area of the curve for values of n; and n; chosen so that 
nı corresponds to the number of degrees of freedom associated with the 
larger of the two estimates of variance. 

The z-distribution is unimodal and symmetrical if nı = т». For large 
values of nı and n; and also for moderate values when n: and ne are equal 
Or nearly equal, the distribution of z becomes nearly normal about a 
mean of zero with a standard deviation, or standard error, 


1/1 1 


It is to be noted that z is a Studentized function; hence it is especially 
appropriate for small samples. The z-test may be regarded as an 


extension of the t-test to situations where more than two variants are 


under comparison. In fact, Fisher (Ref. 2) has shown that the normal 
curve, the x?-distribution, and Student’s distribution are included as 
special cases of the two-parameter family of curves represented by the 
zdistribution. For instance, since 2 = log. (0, the values for nı = 1 in 
the table of z are the logarithms of the values for Р = .05 and P = .01 in 
the table of ¢ (Ref. 3). 
Tables of the variance ratio : 
Р = с = 21 (3.24) 


are available (see Table IV, Appendix) and are coming to be more com- 
monly used than the table of z, since the troublesome logarithmic trans- 
formation is thereby avoided. Against this advantage perhaps is the 
advantage of greater accuracy in the use of the z-tables when interpola- 
tions are required. Tables for seven points of the F distribution are now 
available (Ref. 6). 

The Binomial Distribution in Sampling Theory. We have previously 
described the binomial distribution and indicated that the normal dis- 
tribution may be used as an approximation to it (see page 27). Since 
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this distribution plays such a significant role in sampling theory, it should 
be considered somewhat more broadly. 

A Sampling Experiment Leading to the Binomial Distribution. We 
begin by presenting the results of a simple sampling experiment consist- 
ing of the tossing of 10 coins 512 times. 

The record of this experiment is available in Table 19. The observed 
values for the several probabilities of success, that is, the proportion of tails, 
X,, are given in column 2. The calculations for the mean and standard 
deviation of the number of successes are given in columns 3 and 4. It is 
found that the mean X = 0.5 and that the standard deviation s = .162. 
The corresponding theoretical values are 0.5 and .156, respectively. 


TABLE 19 
Tue Test оғ GOODNESS or Fir or THE THEORETICAL BINOMIAL DISTRIBUTION FOR 
THE OBSERVED DISTRIBUTION оғ Successes (THE PROPORTION оғ TAILS) FROM 
512 Tosses оғ 10 Corns Ат А TIME 


І = fi)? 
Ж; fo ух fx: f fo — fı e 
(1) (2) (3) (4) (5) (6) (7) 
44. du 2.0 2.00 0.5 5.5 1.5 0.4091 
0.9 5 4.5 4.05 5:0] 7 [| uasa ӨРІП 
0.8, 15 12.0 9.60 22.5 — 7.6 2.5000 
0.7 68 47.6 33.32 60.0 8.0 1.0667 
0.6 105 63.0 37.80 105.0 0.0 0.0000 
0.5 134 67.0 33.50 126.0 8.0 0.5079 
0.4 95 38.0 15.20 105.0 —10.0 0.9524 
0.3 55 16.5 4.95 60.0 — 6.0 0.4167 
0.2 23 4.6 0.92 22.5 0.5 0.0111 
«n 3 ho 0.8 0.08 5.0 5.5 4.5 3.6818 
0.0 2 0.0 0.00 0.5 i 
Total 512 256.0 141.42 512.00 xo? = 9.5457 
df. =8; .30>P >20 
Sample values: 
£ =$ =0.5 
_ 414142 
=e EE ~ .25 = .162 
Population values: 
и = 0.5 
5х .5 
с = 10 = 156 


In column 5 the theoretical values f, are given. Finally, we tested 
the agreement between the observed and theoretical values by means 
of the x?-test [column (7)]. We wish to test for goodness of fit and 
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enter the x?-table (Table III, Appendix) with xj = 9.5457 and 8 degrees 
of freedom. It is found that P > .20. We conclude that the observed 
distribution may be regarded as in accordance with the binomial distribu- 
tion law; that is, the discrepancies between the observed and theoretical 
frequencies may be attributable to random sampling fluctuations. The 
theoretical basis of the binomial distribution is given below. 

The Binomial Expansion. Assume that we take N random samples 
each of size n and in each of which a specific number £ possess a given 
character a and the remainder n — t do not possess the character. Let 


i Р . 
n р ааа] = i = q; then the frequencies of samples with 2 = 0, 1, 2, 


- , n are given by terms іп the series N(g + p)"; that is, 


--1 221—2 т\р"! : 
N [e + nr + n UPS ЛЕТ ШЕ? Eus +o] 


(3.25) 


The terms in the expansion (q + p)” are relative frequencies in the 
frequency distribution of all possible different samples, classified by 
number of successes, say £, that may be drawn from the population 
according to the rules of simple random sampling. The distribution 
may be called the sampling distribution of the number of successes, 
і-0,1,2,-.-,4---,т. Тіз more commonly known as the bino- 
mial distribution, since it results from the expansion of (9 + р)". 

The mean of the distribution is given by 

и = тр (8.26) 


and the variance е 
с? = npg (3.27) 


If instead of the actual number of ts in each sample the proportion 


ОЁ i's, that is, din of the number in each sample, is recorded, the mean 
n 


Proportion of ¢’s would be 
о S wo 22 (826 


and the variance 
айн 24 (3.29) 


The standard deviation of the sampling distribution or the standard 
error provides a basis for judging the exceptionalness of any obtained 
Sample, as illustrated in the following example. | 

Ехлмрів 2. The Measurement of Exceptionalness. Assume that in 
a random sample of 50 individuals, 16 have a character, say А. Is this 
exceptional? Tt is known that in the general population 20 per cent 
Possess the character A. 


Of a random sample of 50 individuals the exact proportion who 


58 SAMPLING DISTRIBUTIONS [Снар. ПІ 


would be expected to have character A is given by the sum of the terms 
of the expansion of the binomial ($ + 4)50 from the seventeenth term 
onward. This proportion equals .031. This method, though exact, is 
extremely laborious. For this reason it is advantageous to use an alterna- 
tive method. 

If p = q = and n is large, the area under the appropriate section 
of a normal curve gives a close approximation to the point distribution 
of the binomial. Departures from the given conditions result in less 
accurate approximations. For example, if either p or 4 is small and n is 
not large, the approximation could be rather crude. А practical pro- 
cedure for determining the relative values of p and q for a given n if the 
normal curve may be expected to represent the binomial is the following: 

The mean of the distribution should be, say, three standard deviations 
from the start. 'Thus we want 

тр > З Vnpq 
т?р? > 9прд 
np > 9(1 — р) 
рп +9) > 9 
9 
?>:%% 


If, for example, n = 50, then 
p = 3% >.15 


In using the normal curve as an approximation, we proceed as follows 
in the problem worked out above by the binomial expansion. Calculate: 


Х = np 155-10- 
/npq 2.83 


According to the normal table, 


1.95 


P = 025 


This value is compared with P = .031 above. 

The Sampling Distribution of Differences between Percentages. Fre- 
quently, the experimental results relate to the case of two samples where 
it is desired to know whether the two samples may be regarded as random 
samples from the same population. Thus: 


(1) In sample 1 of size mi, there are tı individuals that have the char- 
acter A. 

(2) In sample 2 of size nz, there are t; individuals that have the 
character A. 


Could the two samples be random samples from populations in which 
p (the probability of character A occurring) is the same? Thus: 


ХА du Же 
n1 na ? 
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'The theoretical or mathematical model against which such experi- 
mental results may be compared can be built up as follows: Assuming 
that p; = рг = p, the variation in і in repeated samples of n, follows the 
binomial (9 + р)"; similarly, the variation in ts in repeated samples of ns 
follows the binomial (т + р)". 

tı varies about a mean of тір with a standard deviation, 4/7pq; 
Ш/ті varies about a mean of p with a standard deviation 


1 = 2%, pä- |= 
z, Мир = a ДЕ p|=0 


That is, the mean in repeated sampling is p; 


CN Үл 
DIE »| n1 


Similarly, 
1, te | _ 4 
P= = Е|Е- = + 
Е Е »| ; E p Б 
Also, 
(6-36-3]- 
Є ») (2 
Consider: 
1 te 
@= == — — 
tı 2 
е Д * t ь LN М 
e -[(-5) -2(6- 6-7) + -? 
pq , PY 
== + 
nı | Ne 
| з = [4° Ed (3.30) 
o = [d] = pal] Т us ч 
1 1 
езін s n 5) (3.31) 
The ratio 


omm, (3.32) 


do. lh 1 ے‎ 
gà 1 1 
АР e +2) 


іп repeated sampling will be approximately normally distributed about 0 
With unit standard deviation. ‘The normal model may therefore be used 
for comparing the experimental results. The complete procedure is 


1. Assume the hypothesis pı = P2 = P- Ф. А | 
2. Estimate р from the data; the maximum likelihood estimate is 


ty + te 
т + na 
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1 1 
(а +) 
4. Refer to the normal probability scale; consider whether the results 
are compatible with the hypothesis. 


3. Calculate the ratio 


PROBLEMS 


The following problems are designed to give the student an under- 
standing of sampling and sampling errors. The following normal popu- 
lation of numbers with a mean of 30 and a variance of 100 may be used 
for the exercises. Write or type each of the 100 numbers on a small 
square of cardboard or stiff paper. Place the 100 pieces in a box and mix 
thoroughly. Then draw one card at random from the box and record the 
number оп it. Return the card to the box. Mix the cards again, draw 
a second card and record its number, and so on until a sample of a speci- 
fied size has been obtained. 


Frequency DISTRIBUTION or NUMBERS, X's, IN A NORMAL POPULATION WITH и = 30; 


o? = 100 

X f x у XJ X f 
57 1 39 3 28 3 15 1 
53 1 38 2 27 8 14 1 
49 1 87 2 26 4 13 1 
48 1 36 8 25 3 12 1 
47 1 35 8 24 8 qj. 1 
46 1 34 3 23 2 71 
45 1 33 5 22 2 $ d 
44 1 80: 2 21 3 
43 2 31 4 20 2 
42 8 30 10 19 3 
41 3 20 4 18 3 
40 2 17 2 

16 1 

Total 100 


í——————— —— HO ——— SERM 


Exercise: Selecting 20 samples of 10 at random, 

1. Compute 20 means. 

2. Compute 20 variances. 

3. Combine to make 10 random sets of paired values of the means; of the 
variances. 

4. Compute 10/28 for differences between means of uncorrelated measures. 

5. Take 10 samples of 5 in pairs and calculate the correlation coefficients. 

6. Combine the results of the individual students in each case, form the 
frequency distribution of the statistie, and plot the histogram. Calcu- 
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late the mean and standard deviation of each distribution and compare 
with the population and expected values. 
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CHAPTER IV 
THE TESTING OF STATISTICAL HYPOTHESES 


The Role of the Hypothesis in Scientific Investigations. In the well- 
developed empirical sciences, scientific procedure is primarily concerned 
in deriving predictions the validity of which is tested by the results of 
experiments. The modern development of science has been much 
facilitated by the practice of using hypotheses in planning and guiding 
scientific inquiries. Even a casual study of the investigations of eminent 
scientists reveals that they were guided in their work by some theory and 
that this theory guided their observations and experiments. Where the 
theory proved inadequate, it was modified. Occasionally it was aban- 
doned completely, but then another was sought to plan the action. 
Significant experimentation requires the guidance of a hypothesis, and a 
successful experimenter does not collect observations unguided by theory 
to which the facts аге related. Bacon maintained that if enough instances 
are gathered and tabulated correctly, the principle which explains them 
will simply emerge without any hypothesis about them having been 
formed. This contention has not been proved by the experimenter. 

The working hypothesis plays an important part in statistical research. 
It serves as a guide in planning the investigation; in determining what 
data to collect; in classifying, ordering, and reducing them; and finally 
as the basis for formulating the judgments with respect to it. 

Similar to the working plan of Newton, the scientist who did not 
formulate hypotheses needlessly (“hypotheses non fingo"), or to the meta- 
physical requirements of Ockham, the logician who considered it needless 
to recur to many entities when it was possible to get along with fewer 
ones ("nunquam ponenda est pluralitas sine necessitate"), the statistician’s 
preferred method is to test the simplest hypothesis and to hold to 2 
minimum number of new quantities or constructs. Thus the preferred 
hypothesis used by the statistician in the examination of his data is that 
the apparent variations and the estimates of presumed effects may be 
attributable to random sample errors or to fortuitous factors rather than 
to the action of new causes. This hypothesis can be tested by the 
application of the theory of errors. It will be recalled that the statistical 
models previously deseribed were constructed on the basis of sampling 
errors. As long as experimental results conform to these models, the 
hypothesis of chance (or, more specifically in this case, sampling errors) 
being the cause of the observed effects is accepted. 

The hypothesis that chance factors may have given rise to an observed 

62 
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effect is frequently spoken of as the null hypothesis. This hypothesis is 
met with in a number of different forms in research or statistical work. 
In experimentation, for example, it is often desirable to compare the 
effects of various methods of treatment or of production. The null 
hypothesis can in these cases be stated as follows: There is no difference 
in the outcomes of the several treatments, or, The outcomes are the same. 
This method is equivalent to determining whether or not the observed 
difference should be ascribed to random fluctuations or judged to be 
significant, that is, ascribed to the differential treatment. The null 
hypothesis assumes the former alternative, which, if found to be incom- 
patible with the facts of observation, is then rejected. More generally, 
the null hypothesis may be stated thus: Can the samples under examina- 
tion be regarded as having been randomly chosen from the same or sim- 
ilar population? 

General Theory of Testing Statistical Hypothesis. In Chapter III, 
Sampling Distributions, it was stated that the statistician developed 
mathematical models against which the research worker could compare 
his experimental results and draw conclusions with respect to their 
Significance. The process of determining statistical significance was said 
to consist in comparing the numerical data (or some function of them) 
obtained in a particular experiment with the model to establish whether 
or not they conform to the model. The name applied to the process of 
examining the significance of the data is the test of significance. In 
dealing with the sampling distribution we were concerned with testing 
the agreement between the distribution of our set of sample values and a 
theoretical distribution. In this case, we spoke of a test of the goodness 


of fit. | 
o speak of the problem of testing 


More recently, we have come t rob — 
Statistical hypotheses and thus to speak of the test of significance relative 
to the hypothesis in question. Before proceeding to illustrate the appli- 


cation of these tests to some practical problems met with by the research 
Worker, we shall describe briefly the ie gg basis underlying current 
Procedures i i atistical hypothesis. | 
a нісі эрик ҮР is the measurement of a certain 
character and that a number of repeated ig ЧЫ ane [in Ed 
tim ain N random variables Xy Азу...) Ах 
ей Келе шіні. to be independently distributed, and 
he set of values is said to be a sample of N independent gba on 
The sample of N observations шау be represented ^ a sample oo 
Е, in the N-dimensional space having as its coordinates 2 m NN ў Ne 
he space in which the point lies may be called the ung e à А 
Assume that the distribution of x is normal me that the ү pd 
Some parameters 61, . . - » ба specifying the popu ation керш. 
hy assumption about the unknown parameters Gy pcr ЛАН 
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called a statistical hypothesis. The statistical hypothesis, H,, is called a 
simple hypothesis if it determines completely the values of all the g-param- 
eters, for example, if it specifies that б, = 1, 6 = 3, - - - . If the 
hypothesis is consistent with more values than one for some parameter 
it is called a composite hypothesis; for instance, the hypothesis that 0; = 02 
for a distribution of X determined by two unknown parameters is a 
composite hypothesis. 

For simplicity, we shall consider the case of a single unknown param- 
eter. That is, let us assume that only one unknown parameter, 0, is 
involved in the distribution function of X and 6, or Ғ(Хі, Xs ... , Ху, 
0). We wish to test the null hypothesis, Ho:0 = б against the only 
admissible alternative hypothesis, Hi:0 = 0. For example, we may 
test the significance of the deviation in the mean of a sample on the 
basis of a random sample of N independent observations Xi, . . . , Ху 
from a normal population X. Then H is the hypothesis that X is 
normally distributed about the mean 00 with standard deviation, с, and 
H, is the hypothesis that X is normally distributed about the mean 6; ` 
with standard deviation, c. 

The testing of the statistical hypothesis involves the choice of a 
region, w, called critical in the sample space W. It also involves the 
decision to reject the hypothesis if and only if the sample point E falls 
in w. Therefore, the test of the statistical hypothesis, Ho, consists in 
rejecting Ho, when the sample point, E, falls within a specified critical 
region, wo, and in accepting Но (or at least not rejecting it) if the point 
falls without wo. The fundamental problem is, therefore, the specifica- 
tion of the critical region, wo. 

The principle upon which the choice of the critical region depends 
was first advanced by Neyman and Pearson (Ref. 3). It is based on the 
control and minimizing of two kinds of error involved in testing the 
hypothesis, Ho: (1) the unjust rejection of the hypothesis, described as 
an error of the first kind, and (2) the failure to reject the hypothesis when, 
in fact, it is incorrect, that is, when some other hypothesis, Hı, is true, 
designated as an error of the second kind. 

The probability of an error of the first kind determined by the 
hypothesis under test, say Ho, is called the size of the corresponding 
critical region, wo, and is given by P{Eewo|Ho}, that is, the probability 
that E, as determined by the observational values will fall within the 
region, wo, as determined by the hypothesis, Ho. This probability may 
be designated by a. 

The probability of an error of the second kind is P(E«(W — wo)|Hi} 
where (W — wo) is the set of all sample points outside wo. It may be 
specified as 8. This probability is called the power of the test with 
respect to Hı. 

Neyman and Pearson (Ref. 4), assuming that Р(Х, . ¥: , Xalo) 
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and Р(Х,..., ХН.) are the probability laws of the X’s as fixed 
by the hypothesis, Ho, being tested, and by Hi, а single alternative, have 
shown that the region wo, established by the inequality 


P(X, * °° , ХАН) > kP(Xy ° ° ° ‚ XnlHo) (4.01) 
when k > 0 is a constant selected such that 
P{ Eewo|Ho} = « (4.02) 


is the best critical region with regard to Hı having size о. The critical 
region, which provides the most powerful test with respect to Hy, is called 
the best critical region for Ho with respect to H.. 

This theory of testing statistical hypotheses is based on the simple 
principle of arranging the test, that is, of choosing the critical region, wo, 
во as to minimize the probability of errors of the second kind while keep- 
ing the probability of errors of the first kind constant. The size of the 
critical region is then determined by о and its power is designated as 
1 — В. It is obviously impossible to make both a and £ arbitrarily small. 
The decision of just how the balance between the two kinds of errors 
should be struck must be made by the investigator and will presumably 
be based on the relative importance of the two kinds of error in the 
Particular situation. It is the function of statistical theory to show how 
the two risks of error may be controlled and minimized. 

In practice, the investigator controls the first kind of error by choosing 
as the value of о, the boundary of the critical region, a specified level of 
Significance, say the 5 per cent, 1 per cent, or 0.1 per cent point value 
of the criterion. The level is decided upon at the time of designing the 
investigation and depends on the nature of the problem and the risk in 
error the investigator is willing to accept. The custom is to reject the 
hypothesis tested if the observed value of the criterion 1s greater than 
(lies beyond, usually) the 1 per cent point, to remain in doubt if it lies 
between the 5 per cent and 1 per cent points, and to accept the hypothesis 
if the criterion is less than the 5 per cent point. With respect to the 
control of the second type of error, studies of the power function of tests 
have been made and tables are available for securing the probability of 
errors of the second kind in some instances. Neyman and Tokarska 
(Ref. 5) have compiled tables for use in determining the probability of 
errors of the second kind in testing Student’s hypotheses. Tang (Ref. 6) 

48 tabled the power function for the test of general linear hypotheses, 
which reduces to Fisher's 2-test. Lehmer (Ref. 2) has prepared further 
_ tables for detecting the probability of errors of the second kind in dealing 
With linear hypotheses. Eisenhart (Ref. 1) investigated the power 


function of the x?-test. 

. The relation between 

involved in testing the hypothesis, 
1:0 = 6, is illustrated in Fig. 4. 


the probabilities of the two kinds of error 
Ho:0 = 0o, against the alternative, 


66 TESTING OF STATISTICAL HYPOTHESIS [Снар. IV 


The probability of accepting the hypothesis, Ho:0 = 0 when it is 
true, is given by 1 — а. That is, the critical region, wo, is the area to the 
right of the ordinate erected at X = Xo in the 6;-curve; the probability 
of accepting the hypothesis, H;:@ = 6; when it is true, is given by £, the 
area under the 0;-curve which lies to the right of the ordinate at X = Хо. 
The quantity 8 relative to бо, 0ı, and а as defined previously is the 
power of the test which specifies wo as the critical region. Hence, а 
and (1-8) represent the probabilities of the first and second kinds of error, 
respectively. 


Figure4. Normal distributions of the univariates p(x,0:) and р(х,6о) with critical 
regions for testing alternative hypotheses relative to the mean. 

Neyman and Pearson use a criterion based on the principle of likeli- 
hood as the basis for accepting or rejecting a given hypothesis. In the 
case of the hypothesis tested above, Ho, the ratio 


Р(Х, Xo, è ,X,) 
PG, Xo, ° <<, 
is designated as the likelihood of the hypothesis, Ho, as tested against the 
single alternative hypothesis, Н. 

In accordance with Equation (4.03), a most powerful region, o, is 
comprised of all points which satisfy the inequality 


P(X, a , ХЬІН1) 

Р(Х, XJA) > * (04 
where /: is selected so that the region should have the required size а 
as indicated in Equation (4.02). For example, the principle for choosing 
the critical region, wo, may be applied to the case of testing the significance 
of a mean of a sample from a normal population, where Но:0 = 0 and 
Hi:0 = 0ı. We specify that the critical region required and defined by 
the inequality [Equation (4.04)] has the size a = .01. 

Since, under the hypothesis Ho, the variate 


xe (4.08) 


D (X, — 00) (a —1,** -,k) (4.05) 


is normally distributed about a mean of zero with variance 1/N, k can 
be read from a normal table: 
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р 2326 
e VN (4.06) 
The most powerful region of size .01 is then 
: 2.326 
(Xa — % > VN (4.07) 


that is, the test specified by the region (4.07) is most powerful with regard 
to all alternatives, 0 > 00. 

If the probability of an error of the first kind, а, and of the second 
kind, 8, is specified in a given problem, it is possible to determine the 
minimum size of sample, N, for which the power of the most powerful 
region of size а is equal to or greater than 1 — B. For testing Ho against 
Hi, for instance, the minimum number of observations is equal to the 
smallest positive integer, M, for which 

Bx(a) < B (4.08) 


is a single-valued function of a. 


where x(a) denotes that for a fixed N, 8 
X, of a predetermined number 


For example, (1) if the arithmetic mean, 
of N observations is less than or equal to à properly selected constant, К, 


the hypothesis being tested, Ho, is accepted; and (2) if X > k, the hypoth- 


esis, Ho, is rejected. М and k are determined such that the probability 


of (1) is equal to 1 — a when 0 = 0 and is equal to В when 0 = 61. 
Sequential Test of a Statistical Hypothesis. Recently a test has been 
developed whereby the number of observations is not predetermined but 
is kept as a random variable. Instead of deciding in advance the number 
of items to be included in a sample, the data are analyzed continuously 
as they are being collected (Ref. 7). In such cases where it is possible 
to examine the data as they originate, as in some manufactured products, 
the sequential probability-ratio test frequently uses half as many observa- 
tions as the current most powerful test. Briefly, the principal properties 


9f the sequential test are as follows: 


(1) The procedure by which a sequential test of a statistical hypothe- 


Sis is carried out depends on the following rule of behavior: 


(a) To accept the null hypothesis bei 

(b) To reject the hypothesis. ; | 

(с) То sumen Т „пе, that is, to continue the analysis by making 
an additional observation. 


The test procedure is kept up sequentially until ei 
S made. М Қ 

(2) If æ is the probability that when Ho is true, the alternative 
hypothesis, H,, will erroneously be accepted, and if B is the probability 
at when Н, is true, Ho will falsely be accepted, then it is necessary that 


ng tested. 


ther decision (a) or (b) 
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а + 8 < 1. Sequential analysis determines in the course of the analysis 
whether or not the data justify a decision with a risk in error of judgment 
as small as а or 8. The number of observations necessary will, on the 
average, depend on how small о and 8 are made; also on how fine a dis- 
tinction is made between Но and Hi. 

(3) The fundamental criterion basic to the decision in (1) is the likeli- 
hood ratio, L, which is the ratio of the probability that the one hypothesis 
truthfully specifies the origin of the observed data to the probability that 
the alternative hypothesis does. The value of L required to accept Ho 

Le B 
а 1-а 
observation and is compared with the critical values necessary for a decision. 
These values of L are independent of the number of observations. Since 
the likelihood ratio, as used in sequential tests, is a continuing product, 

considerable saving in calculation results by using log L instead of L. 

In practice, the quantities а and 8 are usually taken as quite small, 
rarely greater than .05 and frequently .01 or less. 


is 


В. that required to accept H, is L is computed after each 
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CHAPTER V 


CURRENT PROCEDURES IN TESTING STATISTICAL 
HYPOTHESES 


Up to this point, we have defined a number of statistical models 
against which the research worker may compare his experimental results. 
We have also discussed the theoretical formulation and solution of the 
problem of testing statistical hypotheses. It is now the purpose to show 
how, in a given situation when faced with some practical problem, the 
research worker may utilize the principles underlying the theory in decid- 
ing which model, if any, is applicable in his particular problem, and how 
to choose it intelligently and effectively. This chapter will be devoted 
to illustrating ways of solving & number of problems most of which are of 


frequent occurrence. 
Problem V.1. The significance of 
Population. The simplest case of testing significance is in the problem 
where the population is known, that is, the population parameters, the 
Mean and standard deviation, are known and the quantity whose sig- 
nificance we are interested in testing may be assumed to be normally 
distributed in the population. Specifically, the question is: Could this 
sample be a random sample from that population? Such, for instance, is 
the problem of determining whether or not a given sample of pupils to 
whom an intelligence test has been given could be regarded as a random 
Sample from the population upon whom the norms of the test were set 
by the author. қ V 
Assume that it is known that for a particular intelligence test the 
LQ's are normally distributed about а mean of 100 with a standard 
deviation of 17 L.Q. points. The test is administered to a class of 36 
Pupils who in other respects appeared to belong to this population. The 
Mean Т.С). for the class was found to be 108. May we conclude that the 
Class is a random sample from the specified population—that the mean 
ability of the class is the same as that of the population? 
To answer this question we need to determine what model should be 
Used with which the experimental result can be compared. p known 
Tom sampling theory (page 36) that the means of Se | сазев 
rawn at random from this population will be normally distri ute l about 
е population mean, 100 I.Q. i with a standard deviation (or 
Standard error) eguel io aj a 25 LQ. points. We found a 
mean of 108 I.Q. points. How often should we expect to re a menm as 
igh as this or higher in repeated samping from this population 
6 


a mean from a known normal 


70 PROCEDURES IN TESTING HYPOTHESES  [Cmar.V 


The answer is obtained by referring to the normal probability table 
(Table I, Appendix). То enter this table we must convert the raw score 
to a standard measure. Thus: 

100 — 108 
2.82 = ووچ = ۾ 

From the table, we find that in repeated sampling from this population 
we should expect to find a value as high as or higher than the one obtained 
in 1 — .9976, or 0.24 per cent, of cases. This probability is lower than 
the level of 1 per cent which we decided to use. Therefore, we conclude 
that the sample could not have been drawn from the specified population. 
We are aware that in making this statement we shall be wrong in 0.24 per 
cent of the cases; but this is a risk we are willing to run. Such is the 
statistical conclusion. The education conclusion is that the mean 
ability of the class tested is significantly above the norm specified for the 
population. 

Problem V.2. The significance of a mean from an unknown normal 
population. We shall take next the problem in which the population 
mean is known, or specified by hypothesis to be some value, say 100 I.Q. 
points, but in which the population standard deviation is not known. 

We gave an intelligence test to a class in Grade 5 consisting of 26 
pupils. The mean I.Q.-score on the test was 93. We want to know if 
our class may be assumed to be a random sample from a population whose 
mean, и, equals 100 1.0. points. To answer this question we need to 
compare our result with the appropriate model, which in this case must 
be the distribution of 2, (see page 43), since we do not know the popula- 
tion standard deviation. Therefore, we calculate the value of t, say to, 
for our sample and compare it with the i-model. If we find that the 
probability of getting a value of t greater than or equal to + {о in repeated 
sampling is less than 1 in 100, then we conclude that the sample could 
not have been drawn from this population. 

The necessary calculations and procedures are as follows: 


М-2; £=2% روو‎ oe ے‎ XX) iy 


N ССТ 
The value of to is _ | ) 
E Rl И: A 


We compare this value of to with the model as given in the table of the 
i-distribution (Table II, Appendix). We enter the row of the table 
corresponding to n = N — 1, that is, n = 25 in our example. For 
samples of 26 (n = 25), we expect to find values of t greater than or equal 
to +2.787 in 1 per cent of cases; so, clearly, we should expect to find 
values greater than or equal to +t) = +2.97 in repeated sampling from 
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this population in an even smaller percentage of cases. Our conclusion, 
therefore, is that it is unlikely that our sample was drawn from a popula- 
tion in which the mean I.Q. was 100, or, that the mean ability of the class 
is significantly different from the norm of 100. 

Problem V.3. The significance of a mean from a small finite popula- 
tion. In most sampling problems a large population exists or is assumed 
to exist. At times the problem arises of using a sample which may 
comprise an appreciable part of a relatively small finite population 
sampled. The standard error of the mean is then adjusted as follows: 


с N-n 
ex = N (5.01) 


This adjustment follows from the fact that sampling errors affect only 
the estimate of that fraction of the whole which is not included in the 
sample. The value of о, the population standard deviation, is usually 
unknown, but the unbiased estimate of it can be obtained from the 
sample. М is the size of the population; n, the number of sampling units. 

For example, suppose a sample of 50 female students has been drawn 
at random from the 500 female freshmen enrolled in a university. We 
Wish to test the hypothesis that the mean height of the 500 freshmen is 


equal to 168 cm. 
We calculated the following statistics for the sample of 50: 


X = 164.8 cm 
sx = 5.9 cm 
5.9 500 — 50 
Then 517704 499 
= 49 
3= م‎ 


to = 


Sr 
e pé. e = —4.05 
di 


We compare the value fo with the é-model. Entering the table of the 
distribution (Table II, Appendix) withn = N — 1, or 49, we find that for 
n = 40, —tooos = —3.551 and that for n = 60, —ioww, = —3.460. 
Since our value is obviously greater than the tabled values, we may reject 
the hypothesis that the mean height of the 500 freshmen is equal to 168 
cm. If the statement that » = 168 were true, we should expect that in 
repeated sampling, 50 students selected at random from the 500 would 
Біуе a mean as divergent as 164.8 less than once in 2000 trials. 

Problem V.4. The significance of the difference between means. 

ore frequently the problem is that of testing whether or not there is a 
Significant difference between means, that is, whether or not the samples 
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may be regarded as random samples from the same normal population; 
to test the hypothesis that the true difference between means is zero. 

The experimental results may also in this case be compared with the 
model ¢-distribution in making the test of significance (see page 47), 
because (1) the difference between two means may be regarded as nor- 
mally distributed about zero (if the hypothesis is true) with a standard 
deviation c, and (2) the standard error of the difference estimated on the 
number of degrees of freedom provides an independent estimate of c. 
Since t, in general, is the ratio of (1) to (2), it is the appropriate criterion 
for the test of the hypothesis involved here. 

The following caleulations and procedures enable us to make the 
determination of the to in this particular case (the subscripts refer to the 
corresponding sample): 

ы. - ХХі + _ ОХ, 
xX, =" Ny’ Xs = UN 
= 2X1 — #1) E У(Х, — Xj? 
Е М+М, = 5 
(X, — Х.) 


лаш 23 c 
ef 1 1 
N* (у; 3.) 


Refer to to the table of ¢ (Table II, Appendix). 

Let us illustrate by taking the problem to test if two sets A and B 
of test scores from two classes in algebra may be regarded to have come 
from the same normal population. We obtain the following values: 


s? 


Class A Class B 
Ni = 34 М» = 30 
ZX, = 975 УХ, = 795 
X, = 28.68 X, = 26.50 
Х(Х, — X)? = 43274 _ У(Х, — 2»)? = 2969.5 
to = xX, =< X, 
NET? = Ж)? z(x,- 2 m ) 
Nit+N2—2 Ni Ns 
= 28.68 — 26.50 
4327.4 + 29095 ( 1 1 ) 
34+ 30 —2 (31 t+ 36 
zf 


= = .802 
vV 117.69 X .062745 


We enter the t-table in thé row corresponding to n = №. + Ne e^ гы 
to find the probability of obtaining a value of ¢ greater Le pa fees 
+ in repeated sampling. In the example, п = 34 + на vi ei 
this specific value is not given in the table. It is observed fro 
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for n = 60 and n = 120 that the probability of getting a value of £ 
greater than or equal to +0.802 in repeated sampling is somewhere 
between .40 and .50. We conclude, therefore, that the two classes may 
be assumed to be random samples from the same normal population or, in 
other words, that the means of the two classes are not significantly differ- 
ent. The pedagogical conclusion is that there is no real difference 
between the average algebraic abilities of the two classes as measured by 
the test used. 

The hypothesis tested above, that the two samples were random 
samples from the same normal population, is equivalent to testing the 
hypothesis, Hi, that ш = #2 and о? = oi, against the set of all the 
alternative hypotheses which specify only that ji Æ ua Or oj 5 03, Or 
both. General results have been obtained by Sato (Ref. 13, page 1) to 
indieate that the t-test is also the uniformly most powerful of all the 
unbiased exact tests that can possibly be made for the hypothesis, Нә: 
In connection with two uncorrelated normal populations, ті and тз, it is 
assumed as given that c; and оз have the same (though unknown) value, 
to test the hypothesis, that ш = #2 against the set of alternatives that 
ш ¥ ua. The study of the power function of the t-test under different 
conditions has been made by Hsu (Ref. 13). 

We shall next consider a practical problem which occasionally arises 
When there is evidence to indicate that the variances of the two pop- 
ulations from which random samples have been drawn are unequal and itis 
desired to test the significance of the difference between the means. 


Problem V.5. The significance of the difference between means 


When the variances are unequal or unknown. Fora precise test of sig- 


nificance first given by Behrens (Ref. 3) for the difference between the 
means of two samples supposedly not drawn from equally variable popu- 
lations, or from populations having a known variance ratio, the Behrens- 
Fisher method is available (Refs. 7, 8,9, 22). Its application is made in 
the following example. 
At the end of a certain course in science, two groups, one in U High 
School and one in B High School, took the Peterson Comprehensive 
Science Examination (Ref. 18). The following results were recorded: 


School U: N, = 14 X, = 73.21, (S.D.)1 = 21.53 
or хх. — Xi)? = 6489.5726 
School B: N, = 12 Xs = 5630, _ (8.0) = 16.75 
97 z(X, — X4? = 3366.7500 


From these data we obtain: 


(а) P = 16314, m=13, т-І: 30>Р>.20 
(not significant) 
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(b) ğe و‎ — I (5.02) 
Ит =) XX. Xj 
Ni(N; — 1) N2(N2 — 1) 
(c) For 
Ті = 18, ne = 11, mean-variance ratio = 1.6314 X + 


6 = tan! 4/1.6314 X 1$ = tan- (1.1825) = 49°47’ 

We enter Sukhatme's table of the d-function (Ref. 22). We have 
nı = 13, n; = 11, 0 = 49°47’. No d.os (or d.o1) is given for these values. 
So we must find dos to fit these values. We may interpolate for either 
Ті Or ne first; the result will be the same result in either case. Here we 
Shall interpolate for m, first. For n; = 13 we get the following dos 
values: 


0 0° 15° 30° 45° 60° 75° 90° 
m= 8 2.300 — 2.203 2.261 2.235 2.1906 2.176 2.170 
Та = 12 2.179 — 2.175 2.167 2.163 2164 2.168 2.170 


Now we interpolate for na = 11 and get the following 4,5 values: 


m = 13, n: = 11 
0 | 0° 15° 30° 45° 60° 75° 90° 
d; 2.190 2.185 2.175 3.169 2.167 2.169 2.170 
For nı = 13, n; = 11, 0 = 45°: dos = 2.169 
For nı = 13, n; = 11, 6 = 60°: dios = 2.167 


Since our observed dy = 2.162 is less than either of these d.os values and 
since our value of 0, 49°47’ is between 45° and 60°, there is no need for 
interpolating for 6. We now may declare our observed value of d non- 
significant at the 5 per cent level. 

It is worth noting that if we had used the usual ¢-test for the hypothe- 
sis of equal means, the hypothesis would have been rejected at the 5 per 
cent level. Thus: 

Xi = X, 


t= - = = 2.121 (5.03) 
«(2 — Xi)? + У(Х, — Y: ҮМ, + №) 
Ni+WN2—2 МАУ» 
Forn = № +N 2 = 24, Рр < 05 


An approximate method for the same problem was proposed by 
Cochran and Cox (Ref. 21), a method to test the hypothesis of equality 
of means with no hypothesis about the population variance when N, + N, 
and s; z 82. In this test the variance of each mean is calculated sep- 
arately. А criterion t is obtained by computing a weighted mean of the 
two t-values for the two samples, the weights being the two variances of 
× = Х, 

ЕУ ЖЕЕ 
weighted ¢-value to judge the significance. 

The approximate test has been applied to the same data analyzed 


the respective means. The ratio is then compared with the 
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above by the Behrens-Fisher formula. The calculations are set forth 


in Table 20. 
TABLE 20 
CALCULATIONS FOR THE Cocuran-Cox METHOD or TESTING THE SIGNIFICANCE OF THE 
ЧүротнЕзїз or EQUALITY OF MEANS wirH No HYPOTHESIS ABOUT THE POPULATION 
VARIANCE 


(3) (4) (5) (6) (7) (8) 


73.21 6489.5726 | 499.1959 | 35.6520 | 2.160 77.0191 
56.30 3366.7500 | 306.0682 | 25.5057 | 2.201 56.1380 


D = 16.91 | 9856.3226 61.1577 133.1571 


d 1.0557.2 + Loss? 
The criterion (weighted 1) = ot 
_ 77.0191 + 56.1380 
= 85.6520 + 25.5057 
= 2477 


The observed t is calculated as follows: 
ў, Ж,  7821- 56.30 
ssf -V/35.6570 + 25.5057 


Since the observed t is less than the criterion ¢, that is, since 2.162 < 2.177, 
the hypothesis of equal means is not rejected. Thus, the same conclusion 
is found as in the case of the exact test provided by the Behrens-Fisher 
formula. 

Where the sizes of the samples are the same, that is, where Ni = No, 
the significance of the difference between the means can be determined, 
even though the variances differ, by calculating the value of ¢ in the usual 
Way applying Formula (5.03). However, the t-table is entered with 
d.f. = №, – 1(= № — 1) instead of Ni + № — 2. 

Problem V.6. The significance of the difference between the means 
of correlated measures. Situations arise in which the two samples are 
equal in number and in which each individual of one sample corresponds 
in some way to a particular individual of the second sample. Such is the 
case, for example, when individuals have been paired or equated on 
certain characteristics, in two different groups. One group is then 
Subjected to one type of treatment and the second to another. At the 
end of the experimental period, evidence is obtained as to whether a 
differential effect has resulted. In this case and in others of a like kind, 
We can use the distribution of £ as the theoretical model. It is necessary. 
however, to calculate іріп a way different from that illustrated in Problem 


= 2.162 
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V.4. As before, we wish to determine whether the two groups may be 
regarded as random samples from the same normal population or to test 
the null hypothesis that the two means arethesame. The individuals in 
the two groups have been equated, a fact that must be taken into account 
when setting up the model. If there is no differential effect, then clearly 
the difference between the criterion measures of the paired individuals 
should be zero. 

In practice, as has been noted, in taking means of samples from the 
same population, the differences will never be exactly zero, even if there 
is no differential effect. The distribution of differences between cor- 
responding values now constitutes a single sample, for which the mean 
difference and the standard error of the mean difference can be calculated 
in the usual manner. The ratio of the mean difference to its standard 
error will be distributed as ¢ in repeated sampling. "Therefore, the 
distribution of ¢ is the theoretical model against which to check the 
experimental results. 

The following data were obtained in an experiment to compare the 
efficacy of two methods of teaching elementary algebra to high-school 
classes. One group was taught by the group method, the other by the 
individual method. The individuals constituting each of the 25 pairs 
were equated on the basis of intelligence test scores and mathematical 
pretests. 

Our problem is to test the null hypothesis that there is no difference 
between the two teaching methods with respect to the outcomes meas- 
ured. This is equivalent to determining from the experimental data 
whether the mean scores on the criterion of the two groups are the same, 
that is, whether the two classes may be assumed to be random samples 
from the same normal population. If it is found that the mean scores 
are significantly different, the conclusion will be drawn that there is evi 
dence of a differential effect between the two methods of teaching.! 

The data are recorded in Table 21. We first calculate the differences 
between the scores made by the paired individuals. These differences 
are given in column (4). We then find the mean of the distribution of 
differences: 


Mean differences, 
.ED 232 


D= Хь = N = ag = 9.28 
We next calculate the variance of the differences: f 4 ә 
г ے‎ NZD? — (=D)? P i 


8р == WW — 1) 
. 25 x 8962 — (232)? 
Ф (25) (24) 
= 283.71 


2 286. 
1 For a rigorous discussion of the single-factor experiment, see page 


Cuar, V] 


TABLE 21 


Я 
CALCULATIONS ror TESTS OF SIGNIFICANCE OF DIFFERENCE IN PAIRED GROUPS BY 
Two METHODS 


PROCEDURES IN TESTING HYPOTHESES 


ie | 
-1 


Achievement | Diference | (2) — БОЈ (8) —50 
Pair 
No. 
Experi- | Con- Е » XY x: қ 
mental | trol p p x } } 1 
a | о 11 oa] ®| © | @ & | @) | ао) 
- + - +- +- + 
I 73 58 15| 225 23 8 184| 520) 64 
Ir 52 | 37 15| 225 2| 13 20 4| 169 
IH| 100 | 53 47| 2,209 50 3 150| 2,500 9 
Iv 60 | 77 |17 280 10 27 270| 100 729 
y 75 51 24 176 25 1 25| 625 1 
VI 67 | 62 5 25 17 12 204 289 | 144 
VII 61 | 55 6| 36 11 5 55 121 25 
ҮШ 59 | 30 29| 841 9| 20 180 81| 400 
IX| 33 | 3916 36 | 17 11 187| 289] 121 
x 19 | 16 | 9|31 34 1,054) 961| 1,156 
XI 32 | 15 17| 289 | 18 35 630| 324) 1,225 
XII 27 37 10 100 | 23 13 299 529 169 
XIII 68 44 24| 576 18 6 108 324 36 
XIV 54 | 27 27| 729 4| 23 92 16| 529 
ху 26 | 43 17 289 | 24 7 168| 576 49 
XVI 30 | 27 3} 9 | 20 23 460} 400| 529 
XVII} 69 | 53 16| 256 19 3 57| 361 9 
XVIII} аз | 29 1| 196| 7 21 147| 49] 441 
XIX| 23 | 13 10 100 | 27 37 999| 729| 1,369 
XX 1 17| 6 36 | 39 33 1,287) 1,521] 1,089 
XXI 26 | 20 6| 36 | 24 30 720| 576) 900 
XXII| 30 9 21| 441 | 20 41 820| 400| 1,681 
ХХІП| 28 | 35|7 49 | 22 15 330| 484| 995 
XXIV} 53 | 21 32| 1,024 3| 29 87 9 mi 
XXv| 23 | 42 |19 361 | 27 8 210 72) 64 
Total | 1,142 | 910 82 314 8,962 |209 1910809 50 493 8,262 12,526 11,974 
чаг. 4 232 —108 | —340 | 7,769 
ill | | г اا ا ا‎ 


1142 
Ў = = = 
3g = 45:08 
910 
Y = 55 = 36.40 
232 
D = == = 9.28 


—108 
Check: X = a + 50 = 45.68 


—340 
Check: Y =—о + 50 = 36.40 


Check: X — Y = 45.68 — 36.40 = 9.28 
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TABLE 21 (Continued) 
Method 1 Method 2 


N ED? — (XD)? 2 oxoy 
үу 1) FRR Аг Айы Ж 
95 x 8962 — (232): Үт + N zy? — 2N Ery 
d 25°, Е МҮ — 1) 
= 3.368 Мх = NEX? — (2X)? = (25)(12,526) 
— (—108)? = 301,486 


9.28 Nzy = NEY? — (SY)? = (25)(11,974) 
bo = 337 — (—340)? = 183,750 
= 2.754 2N Yay = 2(N)( XY) — (xX)(xY) 
n-N—1224 = 2(25)(7769) — (—108)(—340) 
P < .05 = 2(157,505) 
ог "in >P> .01 = 315,010 
о = 2,492; to. = 2.797 N*(N — 1) = (25)? (24) = 15,000 
_ 301,486 + 183,750 — 315,010 
sexi 15,000 
= A/11.3484 = 3.368 
= 9.28 
to = 337 
= 2.754 


Р < 05; 02 > P» 01 
----------------- (<.0502>Р> oS 


The variance of the mean is then 


_ 283.71 
25 


= 11.348 


The standard error of the mean is 


в = 11.348 = 3.37 


In one operation, the calculation of Sp is 


s = [020° = (хр)? 
ps | 
МЕМ = 1) (5.04) 
_ [224,050 — 53,824 | 
ш (625) (24) mi d 


Then to = B = 9:28 _ 2.7537 (5.05) 
=(D — Dy 3.37 
VW — 1) 
From the table of t, entering the row corresponding to n = N — 1 
= 24, we find the chance of getting a value of ¢ greater than or equal to 
+to; that is, +2.754 is slightly greater than 1 in 100 (Lo: = 2.797). 


Hence, the null hypothesis is rejected at the 5 per cent level of significance. 
We conclude that the two groups can not be assumed to be random 
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samples from the same normal population, or that the mean scores are 
significantly different at the 5 per centlevel. The educational conclusion 
under certain assumptions is that the two methods of teaching produced 
Significantly different results. 

When a large number of pairs of individuals are used it may be 
advantageous to work with the original measures, thus avoiding the calcu- 
lation of the differences. 

Using this method of calculation, the value of to is 


X — Xe = = (5.06) 
NEUERSITCAIOEES UR Soe гы 
NW = 1) 


The calculations are shown for the same problem used to illustrate 
the method based on the calculation of the differences. This method 
follows (with appropriate methods for reducing the mathematical calcu- 
lations) the more commonly used formula of the standard error of the 
difference between the means of correlated measures: 


[3 + oh — 2Гз0.0у when X = Xi 
буу = \/ * N E Y = Xs (5.07) 


The demonstration of the equivalence by applying the respective 
methods to the same set of data is given in Table 21. It should be noted 


that the unbiased estimate of c? is used in both cases. 

If we had not utilized the information provided by the experimental 
design, different results would have been obtained as noted below. Using 
the method for testing the significance of the difference between the 
means of random samples as in Problem V.4, we have, since Ni = No, 


b= 


ды %-% 
= TX: — Xy + 20 = Xs 
WW = 1) (5.08) 
9.8 _46 


7 47323191 


Entering the table of t with n = 2(N — 1) — 48, it is observed 
(without interpolation) that we should expect to get a value of t greater 
than or equal to +t, ie, +1.6 in repeated sampling in more than 5 per 
Cent of the cases. The conclusions are, therefore, altered from those 
drawn earlier by the calculation of to. 

It is usually advisable to calculate both to and t and make both tests 
of significance. Sometimes one and at other times the other may be the 
More sensitive. If either one or the other shows a significant difference 
between the means, it is safe to accept the conclusion of significance. If 
as is most often the case in experimental work, the corresponding values 
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for the respective paired individuals are positively correlated, the stand- 
ard deviation of the differences will be thereby reduced. Against this 
favoring circumstance must be weighed’ the fact that in treating the 
results as a single sample, the number of degrees of freedom is only half 
as great as if the two samples had been treated separately, that is, if 
two random samples were used for experimental subjects. From the 
findings of the two tests of significance for the same data, a direct statis- 
tical measure of the efficacy of the basis of pairing used is made available.’ 
Problem V.7. The sign test of Significance. A simple test of signifi- 

cance is available for application to the data in Problem У.б. This is the 
"sign test" or “binomial series test” for the case of randomized blocks 
with two columns (Refs. 10 and 5). The statistic used is the number of 
positive differences among the differences of the several pairs of indi- 
viduals. The zero differences are usually divided evenly among the 
positive and negative ones. Thus: P, = +5 + 305. The mean num- 
ber of positive values expected according to the binomial series (3 + D" 
is ) 

u = np = 12,5 

с = утру = .5 Vn = 2.5 

× = "(Ро — .50) 


5 ут 
.09) 
= 25(8$ —.50) _ 25(.72 —.50) _ Ра 5 
: .5 V5 25 = 


` X may be referred to а normal scale (Table I, Appendix), from which it 
is found that P = .0278, or P < .05. The hypothesis of no difference 
between the two groups as revealed by the differences in signs is rejected 
at the 5 per cent level. 

The method differs from the most reliable t-test in using only the 
information in the sign as compared with the total available information 
in the actual values used by the latter. The former method may be 
shown to be 62 per cent as efficient as the latter; that is, 62 pairs using the 
t-test would give as precise results as 100 pairs in using the sign test.? 

Problem V.8. The significance of the difference between per- 
centages. There is frequent need to determine the significance of the 
difference between two percentages. ‘Take, for instance, the following 
problem: 

According to one investigator, 67 of an unselected sample of 793 males 
and 3 of an unselected sample of 232 females from the same United States 
Caucasoid population were color-blind. Is this evidence of a sex differ- 
ence in this trait? The hypothesis to be tested is 


Ho:pi = pa = р ы 


2 For further discussion of the efficiency of this experimental design see page 292. 
3 For meaning of “efficiency,” see page 1 


ж. d „а 


j 
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The maximum likelihood estimate of p is 
hdc 
po nı + ne 
where /, is the number of color-blind individuals in the male sample; 
ta, in the female sample. 
6743. EN Li 
Pons P 793 + 232 
ъа 
"ni ne 


UT EY IN 
(м 5 + 2) (5.10) 


" Jus — тіз = 8.8 

= JGR Gas) (туз тіз) 
Referred to the normal scale, it is found that P < .01. Therefore, the 
hypothesis is rejected; that is, there is a significant sex difference in color- 
blindness in the population sampled.* 

Problem V.9. The significance of the difference between the abso- 
lute variabilities of two groups. The following problems illustrate the 
method of testing the significance of the difference between two variances: | 

(a) From the measurements of heights in centimeters of 2518 boys 
and 2538 girls, both groups fourteen years of age, the sum of squares | 
of the deviations for the former was 189,811.641552 and for the latter | 
114,896.931496. Is there a significant difference in absolute variability? 

The calculations are carried out in Table 22. 


TABLE 22 
THB SIGNIFICANCE or THE DIFFERENCE BETWEEN Two ESTIMATES OF VARIANCE 


A= 


Log mean 1 
Sex ee Sum of squares square n 
Male 2517 189,811. 641552 |. 4.3226 .0003973 
Жесе 2524 114806931496 3.8133 .0003942 
B т % 


Diff: Sum: 

0.5093; .0007915 
The mean squares are obtained by dividing the sum of squares by the 

degrees of freedom. The difference of the logarithms is 0.5093, so z is 

0.2546. Тһе variance of 2 is one-half the sum of the last column, or 

-0003957; the standard error of z is .0199. 


2=128 


Oz 


2 The x? test is an exact test for this problem. 


ұза? 
a %у * 4 
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Referred to the normal scale, we find: 
P < .001 


Therefore, there is difference in the variability of the two sexes.* 

(b) Two samples of boys were available in a city school system. 
One sample of 121 boys of twelve years of age had а mean weight of 72.7 
Ib. Fifteen years later, another sample of 61 boys twelve years of age 
from the same school had a mean weight of 77.74 lb. The mean square 
of the weights (Ib)? of the first sample was 141.60 and that of the second 
sample, 95.756. Is the difference in variability significant? 

The calculations for the test of significance are set forth in Table 23. 


TABLE 23 
Tue 2-Тевт or THE SIGNIFICANCE oF THE DIFFERENCE BETWEEN Two VARIANCE 
ESTIMATES 


De s of Weight square j 

Sample унана а. | Weight moan sare! 1g, mean square 
1 120 141.600 2.4765 
2 60 95.756 2.2809 


2o = 0.1956 


We enter the z-table of Fisher (Ref. 10) with nı = 120 and n; = 60 
and by interpolation find that zo; = .1917. We could enter Table IV, 
Appendix, of the variance ratio, F, with n; = 120, n; = 60, and Fo = 


151.00 = 1479, and find Fos = 1472. 
Since 20 is slightly larger than 2.05, or Fo slightly larger than F.os, We 
conclude that the difference in variability in weight between the two 
groups of boys is significant (at the 5 per cent level). ы 
Problem V.10. To test the homogeneity of a set of estimated vari- 
ances. The statistical analysis of data often involves the calculation of 2 
number of estimated variances and the testing of whether the ватар 
estimates are significantly different. Three tests of homogeneity О 
variability are described here. " 
Neyman and Pearson (Ref. 16) used the criterion Lı, the ratio of e 
weighted geometric to a weighted arithmetic mean of the mean squares 
from which the variances were estimated, in order to test the hypothesis, 
Hi: 
біл gg = ‘б ор =o 
This is the test that these k independent samples have been drawa Tun 
normal populations having a common standard deviation. 
~The near normality of z for large and equal values of m PA na fres poge 00) 


has been the basis of the test used here. Either the Z- 
could have been used. 
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Welch (Ref. 24) indicated how Lı could be generalized and how the 
weighting for the different sums of squares could be modified. Nayer 
(Ref. 15) computed tables of the 5 per cent and 1 per cent probability 
levels for 1, for the case of equally sized samples. He also considered 
how far, in the case of unequally sized samples, the probability levels for 
Lı might be obtained from his tables. Nair (Ref. 14) investigated the 
form of the true distribution of Lı. 

The test presented here is based on the modified formula of Welch 
and the use of Nayer's tables of the L;-distribution. 

Welch's equation is г 

Ne 4, | 
L- П (9% П КШ (5.11) 
s 


8 


where s = 1, 2, +--+, k; П denotes product; > denotes summation; ns, 
the number of individuals within the sth sample; /У, the number of 
individuals in all the samples; and б, is the sum of squares of the errors 
or the residual of a sample. In the case considered here, 


6, = > (Xa жез Xj 


alue of the variate for the ith individual in the 


where X,; represents ће v 
X he mean of the sth sample. 


sth sample and X, represents t 

Nayer's tables are entered with k, the number of samples, and ñ = A 
Hartley (Ref. 12) later indicated that the 
geometric rather than the arithmetic mean should be used when an 
average of unequally sized samples is needed. In using L;-tables, rejec- 
tion of the hypothesis, Ha, is indicated when the obtained L, is equal to 
or less than the tabled values of Za at the respective 1 or 5 per cent level 
(Table V, Appendix). 

The second test of homogeneity of variances was given by Bartlett 
(Ref. 2). He suggested a test analogous to the Lı test in which the sums 
9f squares are weighted with the appropriate number of degrees of freedom 
instead of with the number of observations as in the Neyman-Pearson 
Criterion. Thus where s? is the unbiased estimate of a? based on a sum 
of Squares having v, degrees of freedom, and there are k independent 
estimates, the test function is 


the average sample size. 


n k 
2 
—2 log, u = N log. [» 4 = 5 (v: loge s?) (5.12) 
t-1 


t=1 
k 
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and natural logarithms to the base e are used. Where none of ог are 
too small, —2 Іор, и is distributed approximately as x? with k — 1 
degrees of freedom if the ont =1,2,--.- ; k) have a common value. 
Bartlett gave a corrective factor, C, for small samples: 


= 1 X uh 2 
٩ =1 + چچچ‎ (5 | ге! 
4 


He indicated that the quantity =O log: 0) followed approximately the 


same x? distribution. 


Bishop and Nair (Ref. 4) demonstrated that even in using the correc- 
tion factor C, the x? approximation is not altogether satisfactory if some 
of the degrees of freedom, », are 1, 2, or 8. Later, Hartley (Ref. 12) 
derived a method of approximating to the distribution of Bartlett's 
—2 log. р, which was shown to be sufficiently accurate to permit the 
degrees of freedom to drop to 2 with a fair approximation even if some 
of the variance estimates based on 1 degree of freedom are among the 
k-values. In Hartley's method the probability integral is represented 
as a weighted mean of x? integrals, Thompson and Mennington (Ref. 
23) have published tables of the criterion called M, based on Hartley’s 
approximation. 


We shall illustrate the three tests of homogeneity of the variances bY 
applying them to the same set of data, 

In Table 24, column three is given a set of five estimates of variance, 
calculated from five samples of intelligence test records of pupils in five 
different grades of a given school. It is desired to test whether or not 
there are any real grade differences in the test score dispersion of the 
pupils. To this end the calculations in obtaining the value of the 
criterion Mo are set forth as shown in the table. 


TABLE 24 ТРИЯ 
CALCULATIONS FoR OBTAINING THE VALUE OF THE CRITERION ror BARTLETT'S TE 


OF THE HOMOGENEITY or ESTIMATED VARIANCES 
DT ® (8) a) (5) © e 


Grate No. of Intelligence 


г 1 
t pupils | variance vi log, s? 0, loge s? = 
т 842 (score?) 


3 35 59.5345 34 4.08656 138.94304 0.02941 
4 37 98.4369 36 4.58942 165.21912 0.02777 
5 35 105.1378 34 4.65527 158.27918 0.02941 
6 36 138.3325 35 4.92966 172.53810 0.02857 
7 37 39.4520 36 3.67509 132.30324 0.02702 


767.28268 0.14218 
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We obtain further: 
Sys? = 15,404.4960 
(us?) _ a = 88.0257 


Dv: 1 


Sy 02 
log. ЕЕ = 4.47763 


Zu, 
Following (5.12), we obtain 


—2 log. u = Mo = 175 X 4.47703 — 767.28268 = 16.3026 


Entering Table VII of the 1 per cent points of the J/-distribution 
(Ref. 23) with & = 5 it is found that all entries opposite k = 5 are less 
than 16.3026. Without further calculation, therefore, it may be con- 
cluded that Mo = 16.3026 is significant at the 1 per cent level. We may 
infer that a significant difference exists in the intelligence dispersion, as 
Measured by this test, among the five grades. 

Since the tables of the M distribution are not as yet readily accessible, 


the test may be made by application of (5.12) and (5.12а). Thus: 
k 
(ns) _ 15,404.4960 _ рв 0257 
N 175 
tel 
k 
У о 
N log. ta = 175(4.47763) = 783.58525 
k 
2 (u, loge 8) = 767.28268 
tel 
و کے کے‎ РЕ З ) _ | 
HEED 31 + 36 ^34 85 ^36 178 1.01144 
х = is — ee = 16.118 


We enter the x*-table (Table III, Appendix) with k — 1, or 4 degrees 
of freedom. We find that our obtained value x3 is larger than the table 
Value x? = 13.277 at the one per cent point. Therefore, we reject the 
hy Pothesis, Ho, and conclude that a significant difference exists in the 
Variability of intelligence test scores among these five grades. 

It may be pointed out here that Bartlett’s test would appear advan- 
tageous in comparison with the Lı-test when the size of the samples is 
much larger than n = 60 (the limit of finite values as given in the Nayer 
table) and an interpolation between 60 and infinity needs to be made. 
Since the range of the L;-values is only from 0 to 1, the test is not highly 
Sensitive. 

The tables of the M distribution may encourage the use of Hartley’s 
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approximation, which is likely to be more convenient as well as slightly 
more accurate. 

We now apply the L;-test, which is made as follows, to the same data 
as in Table 24. 

We calculate first the value log Lı from Formula (5.11). The calcu- 
lations are set forth in Table 25. 


TABLE 25 
Tue CALCULATION оғ LOG L4 rom THE Іл-Тевт or HOMOGENEITY оғ VARIANCES 


Ns f. log n, ns log т, б, log 0’, п, log 0% 


35 34 1.5441 
37 36 1.5682 
35 34 1.5441 
36 35 1.5563 
37 36 1.5682 


P т, log n, = 280.1606 
4 


2,024.1714 3.3062 
3,543 7297 3.5495 
3,574. 6857 3.5532 
4,811.6389 3.6850 
1,420.2703 3.1524 


15,404.4960 3 n, log 0), = 620.7093 


1 1 
өші = lo N = у тш + У tog 0t os (J %) 


We find that 1, corresponding to the logarithm 1.95951 is .911. 


=E ; 5 
k = 5; harmonic mean off, = пира тс че Og ee тыч а 
= 34,98 

We enter Nayer’s tables (Table V, Appendix) with k = 5 and f = 35 
and note that our value, .911, is less than the interpolated 1 per cent 
value of Lı. Therefore, we reject the hypothesis and infer that there is à 
real difference in variability in the intelligence test scores among the 
five grades. 

Problem V.11. The Significance of the difference between two cor- 
relation coefficients. The following product-moment coefficients of cor- 
relation were obtained between Scores on two examinations in algebra 
administered at the end of the school year in May and at the beginning 
of the next school year in September. These results were obtained by 
a mathematics teacher in two different schools: 


School A: ті = .78; mi = 59 
School B: Та = .62; тә = 48 


ІСІ 
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Is there a significant difference between the two correlation coefficients, 


7; and т? 
1 1+7 


a = 9 loge 7 =a 


= 1 flog, (1 + 73) — log. (1 — 73) 


45 _ 125- — 9285: ے 
Ves + 3s‏ 

We enter the normal table (Table I, Appendix) and find that for 
z = 45, P > .05. We, therefore, conclude that there is no significant 
difference between the two correlation coefficients. 

Problem V.12. The significance of the difference between correla- 
tion coefficients determined on the same sample. The following product- 
Moment coefficients of correlation were obtained in a class in college 
biology, consisting of 73 students. 

туу = .30, the correlation coefficient between scores on a test on 
Vocabulary (1) and scores on a test for interpreting various situations 

ealing with states of health and disease (y); 

Ty) = .42, the correlation coefficient between scores on a test of 
Biological principles (2) and scores on tesk (y); 

Ti» = .603, the correlation coefficient between tests (1) and (2). 

The problem is to test the significance of the difference between the 
Correlation coefficients, 7: and ту. Since these correlations were 
obtained on the same sample the procedure described on page 54 is 
followed: 

pa б т) 8)(1 + т) 
a0 — 73, — rà — та + 2гзїл7уз) 
r= (30 — .42)?(73 — 3)(1 + .603) 
° = 311 — (603)? — C30)? — (42)? + 2(-603) (80) (42)] 
= 1.55 

We enter the table of F (Table IV, Appendix) with m = 1 and 
^» = 70. We find that our value, 1.55, is less than Ё.% = 3.98; hence Fo 
is not significant. We conclude that there is no significant difference in 
the two correlation coefficients. 
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Problem V.13. To test the significance of a regression coefficient. 
In a simple regression of one independent variable, an important test is 
whether the regression coefficient is significantly different from zero, or 
the test of the hypothesis that there is no regression of y on x in the 
population sampled. For the required test the theoretical model against 
which the experimental results may be compared is the ¢-distribution. 
The value of to is calculated from the sample and the table of t is entered 
with n = N — 2 to determine the probability of getting a value of t 
greater than or equal to +t in repeated sampling. Неге to is the ratio 
of the regression coefficient, бу, to the standard error of byz; that is, 
b = bus (5.13) 
съ 
The standard error of b is given by 


бу,: 
- 5.14) 
44 м 52? ( 


where ту, the standard error of estimate, is obtained from 


3 = 2 > 2 
tes Am = = = pe 22) (5.15) 


N-—2 


in which Yo is the observed value of Y and Y+ is the value of Y estimated 
from the regression equation. The number of degrees of freedom is 
N — 2, since two statistics are estimates of two different parameter 
values in the regression equation: Ук = а + bx. The calculations and 
procedures are illustrated in determining the regression coefficient and 
in testing its significance in Table 26. 

This problem was that of setting up a regression equation for the 
purpose of predicting a knowledge of one character Y, from a knowledge 
of a second character X. In this case it was desired to predict the score 
of an individual on one form of an examination from his score on the 
second form. The prediction equation is 


Ya = F + SY (x - 2) (5.16) 


This equation is called the regression equation for estimating Y from X. 
It is fitted to the observational data by the method of least squares. 

In this regression problem, it is necessary to run two tests of sig- 
nificance: (1) for the regression coefficient b. and (2) for the mean of the 
dependent variable, Y. 

The test of significance for the regression coefficient, byz, is given by 
t= би, 

Sb 


? 
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In our problem the values are 


We enter the table of t with n = N — 2 = 25 — 2 = 23 and find that 
P <.001. Therefore, the regression coefficient is highly significant. 


TABLE 26 
CALCULATIONS FOR SETTING Up THE REGRES- и 
SION EquATION BETWEEN THE SCORES ON Two 
Forms or a Test 


Indi- | 0 е ө | = 50 + = 53.24 
маа! X YF y'|xe* ҰЗАТУ — mean of X scores 
PF = 50 +4 = 51.88 
= mean of Y scores 
1 46 52-24 2| 16 Шы Тебе = Ж =Z mirar _ Р 
2 38 54-12-12 144 144] 144 | دوو‎ ух — X) = zX^ — хх”? 
3 | 64 o3 14 13 196 169 182 [2° 7 zu = Xy = ХХ 25 
4 73 65 23 15| 529 225) 345 . (81) 
5 | 61 58| 11 8121 64 88 = 4195 — “95 
6 | 34 33-16 —17] 256 289 272 = 4195 — 262.44 = 3932.56 
7 | 57 49 7-1 49 1-7 (47)? 
З 66 63| 16 13| 256 169| 208 | Ху! = 4403 — 55 
25 9241—25 -26| 625 676) 650 » = = 
10 | 30 26-20-24 400 576| 480 4:05 wc 
11 | 45 33|- 5 —17 25 28) S5|zzy = ZX'TT — 95 — 
12 | 73 71 23 21 529 441| 483 "— 
13 | 45 48— 5 — 2] 25 4 10 4035 (80/67) 
14 | 55 63 5 13| 25 169| 65 25 
15 | 66 70| 16 20| 256 400| 320 = 4035 — 152.28 = 3882.72 
16 49 46— 12—4| 1 16 4 
17 | 64 65 14 15 196 225| 210 
18 | 45 46|— ó — Д 25 16 20 
19 61 62| 11 12| 121 144 132 
20 | 52 46 2-—4 4 16-8 
21 | 67 68| 17 289 324| 306 
81 9 27 


18 
22 59 53 9 3 

23 55 55 5 5 25 25) 25 
24 51 52 1 2 

25 | 50 48] 0-20 4 0 
Total |1831 1297| 81 474195 44034035 


X and Y are scores on two tests. 
X' = X — 50 and Y’ = Y — 90. 


Significance of Regression Coefficient 


Regression Equation 
Ys = y 4 Хху EE CANNE DE OT 
an P+ 4-4 = xs 4/303256 627 
3882.72 = 10729 = standard error of the regression 
= 51.88 + 3932.56 (X — 53.24) coefficient 
= 51.88 + .9873(X — 53.24) T dum =135~\P «01 


51.88 + .9873X — 52.56 
.9873Х - 0.68 
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TABLE 26 (Continued) 


Standard Error of Estimate Test of Significance of ӯ 
2 _ (o 0-а) уп 


(Үз — Ye)? zy — - зї $ 
iis - = ҮСҮ? 
: ді 2 iz where s = ato — Же); D 
14.64 — (8882.72) n—2 
_ 4314.64 — 3932.56 n-n-2 
EE iE aM 


_ 4 [4314.64 - 3833.51 - 481.13 
23 23 
= 4.57 = standard error of estimate 


= 4//20.9189 


A simpler alternative test of the significance of the regression coeffi- 
cient can be made, where the correlation coefficient, ra, is available and 
under the conditions indicated below. 

When the regression of y on 2 is linear and the arrays of y are normal 
and homoscedastic5 (see Ref. 11), the t-test affords the exact test of the 
significance of the deviation of a sample regression coefficient from any 
hypothetical value (specified by the hypothesis tested) divided by an 
estimate of its standard error, considered as a random sample of similar 
estimates in repeated samples with the same values of т. 

When the hypothesis under test is that the population value, p, is 
zero and when the distribution of X is continuous, the t-test for byz is also 
an exact test for the sample correlation coefficient, r: 


b_rVN-2 
== id d. پک‎ 5.17 
: Sp м1 — 7? ( ) 


This equality is illustrated by calculating r for the set of data in Table 26. 
Thus, with r = .94, 


Entering the t-table with n = 23, it is observed that P < .001. There- 
fore, the observed value of b (or r) is highly significant. 

For a test of the hypothesis that two regression coefficients b; and Ds, 
obtained from two random samples of sizes N, and М», are from the same 
population, % is given by 
to = —— =b i 

1 
7 
(за + Sa 
в AF — Ve)? + z(Yi — Ys 
Nı + Na = 4 


(5.18) 


where s 


(п = № + № — 4) (5.19) 


* For tests of linearity and homoscedasticity, see page 241. 
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As an alternative, the significance of the difference between the two 
correlation coefficients could be tested as in Problem V.11. 

Problem V.14. The significance of the mean of the dependent vari- 
able in a simple regression equation. The same set of data used in the 
preceding problem may be presented to illustrate the test of significance 
of the second estimate in the regression equation, the estimate of any 
hypothetical value a. In this case the t-distribution may also be used. 
The sample value of to is 


t 


[207 = Ye)? 
where s= EUR: (5.21) 


Then for this sample and where а is specified as zero, to becomes 


Е Eza (5.20) 


51.88 4/25 
ax ORES V ES = 56. mW 
to 4.57 Ж in ) 


Obviously, P « .001. | 
Applications of {һе Chi-Square Model. The chi-square (Ref. 21) 


model has wide application in statistics, particularly asa test of sig- 
nificance in dealing with enumerative data so characteristie of the study 
of attributes. It is appropriate for testing whether a set of observed 
values differs significantly from those which would occur if some specified 
hypothesis were true. One general method of testing such a hypothesis 
is to work out results which would be expected theoretically and then to 
compare these with the observations. 

Problem V.15. То test the effectiveness of principles of classifica- 
tion. We may have individuals classified by two characteristics and wish 
to test the hypothesis that the characteristics are independent or that the 
principles of classification are independent. . . 

In applying the x?-test to two or more classifications, usually the 
Statistical hypothesis under test is that the two characteristics upon 
Which the individuals have been classified are independent of one another, 
and then the truth or falsity of the hypothesis is tested. This procedure 
is equivalent to determining whether a set of obtained values differs 
Significantly from those which would result if only chance factors were in 
operation. 

In the following example the x 


tingency table (Table 27). | u 

Here 366 twins have been classified on the basis of two characteristics 
according to (1) their genetic constitution, that is, according to whether 
they are identical or fraternal twins, and (2) the presence or absence of 
mental deficiency. The numbers of identical and fraternal twins are 
recorded in the marginal totals in the last column of the table of observed 
values. The number of concordant and disconcordant twins with respect 


? test is applied to a 2 X 2-fold con- 
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to mental deficiency is given in the marginal totals in the last row of the 
table. 

The x?-test is applied to determine the independence of these two 
factors. The geneticist or psychologist might state the problem thus: 
Assuming that the data are accurate, homogeneous, and unselected, with 
what frequency could so large a disproportion between the two classes of 
twins arise if the same causes leading to mental deficiency had been 
operative on the two? 


TABLE 27 
CONCORDANCE AND DiscoNCORDANCE IN IDENTICAL AND FRATERNAL TWINS FOR 
MENTAL DEFICIENCY (After Rosanoff, Ref. 20) 


Observed values 


Type 
Number Number Total 
concordant. disconcordant 98 
Identical twins 115(a) 11(b) 126 
Fraternal twins 128(с) 112(d) 240 
Total 243 123 366 


Expected values 


Type 
Number Number 
concordant disconcordant Total 
Identical twins 83. 66(a) 42.34(b) 126 
Fraternal twins 159.34(c) 80.66(d) 240 
"Total 243.00 123.00 366 


ss we - 
The number of observations to be expected in each cell where only 
chance factors are operative can be calculated from the total frequency 
in this way: Multiply the total number of identical twins, 126, by the 
total number of concordant twins, 243, that is, 126 X 243 = 30,618, and 
divide this product by the total number of twins in the sample, 366, that 
. is, 30,618/366 = 83.66. Тһе expected number in the other cells of the 
tables can be calculated in the same way. This need not be done, how- 
ever, ina 2 X 2 table. Since the marginal totals are fixed, the expected 
values for only one cell need be calculated, the others being filled in by 
subtraction. Thus, the expected value for cell b is 126 - 83.66 = 42.34; 


that for cell c, is 243 — 83.66 — 159.34; and that for cell d is 123 — 42.34 
= 80.66. 
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x? is given by the formula 
dics (fo — fu)? 
х = у RE : (5.22) 


where fy stands for observed frequency and f, for expected frequency 
The square of the differences between the observed and expected values 
is divided by the expected value for each cell. These quotients are 
summed to give x^. 

The calculations for the above data are presented in Table 28. 


TABLE 28 
Tug CALCULATION OF x? FOR THE Dara IN TABLE 27 


ds. 2 
Cell fo f а-ә | ш-н | eR 
a 115 83.66 31.34 982.1956 11.74 
b 11 42,34 —81.84 982.1956 23.20 
с 128 159.34 —81.84 982.1956 6.17 
d 112 80.66 31.34 982.1056 12.18 
Total 366 366.00 00.00 xo? = 53.29 


The calculated value x is used to determine the probability of getting, 
on a random sample, the value of x? equal to or higher than x2 in repeated 
Sampling. The alternative is the probability that the difference between 
the observed and expected values may be attributable to chance alone. 


This probability is obtainable from Table III, Appendix, Distribution 
of x*. The number of degrees of freedom with which the table is entered 
bserved that only one of the 


is in this problem equal to 1, since it was 0 

cell frequencies could be filled in independently. When this quantity is 
Specified, the other cells can be filled in by using the marginal totals. 

Therefore, we enter the x2-table with a value of x3 = 53.29andn = 1. 

It is noted that for values of x? greater than 10.827 the probability that 

bserved and obtained frequencies could 


have arisen by chance is 
for a value of x? = 53.29. 
in 1000. Hence it may be conc 
used in this problem was effective, 
type of twin and mental deficiency, 
It may be pointed out that in a 2 
been obtained directly from the formula 
"T (ad — bc)*(a + b + c +d) (5.23) 
Y= Та F Бус + dla + o d ) 


are associated. 
x 2 table the value of x? could have 


Tn our problem, 
[(115)(112) - (1D 128) 60) _ 53 20 


xi = — (26) (240) 243) 023) 
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The correction for continuity, devised by Yates (Ref. 25), is useful 
for extending the application of x?-test of significance to contingency 
tables with small frequency, that is, to data in which the expectations 
are small. Cochran (Ref. 6) has presented and illustrated the principles 
involved in correcting for continuity on some applications of x?. 

The process of calculating x? for a 2 X 2 table can be extended to the 
general case of the r X c contingency table. In general, in a table of 
r rows and c columns the number of degrees of freedom in x? is (r — 1) 
(c — 1). Bartlett (Ref. 1) devised a method of calculating x? for 
multiple-dichotomous tables, that is, those of the form 2¥. Norton 
(Ref. 17) presented and illustrated a method of successive approximation 
for obtaining the R departures from expectation in à complex contingency 
table of the form 2* x R. 

Problem V.16. To test the homogeneity of two or more frequency 
distributions. A useful application of the x^-test is in testing the 
hypothesis that two or more frequency distributions could have come from 
the same homogeneous population. "This is a more stringent test than 
those tests of the significance between certain summary statistics of the 
distributions, since by it the distributions are compared in all respects. 
Furthermore, it is possible to separate the contributions to x? of the 
individual degrees of freedom, and so to test the distributions by parts. 

The following example illustrates the case where there are two dis- 
tributions and n’ classes with n’ — 1 degrees of freedom. The method of 
caleulating x? devised by Brandt and Snedecor (Ref. 11) is followed. 

The two samples are distributions of two groups of freshmen entering 
a particular college of the University of Minnesota classified according 
to college aptitude-test rating. One distribution, of 475 students, pre- 
sented two units of high-school mathematies; the other, of 111 students 
presented three units of high-school mathematics at the time of entrance. 
We wish to test the hypothesis that these two samples are from the same 
homogeneous population with respect to aptitude as measured, or whether 
there is a significant difference between the two distributions. 

If we denote the column of frequencies of the group with two units 
by a’, that of the group with three units by a, the value of x? is given by 


the formula 
1 
х? = a $) (ap) — np) (5.24) 


a 

where p = Gta 
T 71 
ШЕКТЕГІ 


The calculations of x? for the test of significance of the homogeneity 
of the two frequency distributions are given in Table 29. 
For a x§ = 30.96 with n = 9, we enter the x*-table and find that for 
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TABLE 29 
CALCULATION oF x? FOR Two FREQUENCY DıIsTRIBUTIONS—ONE WITH Two Units 
or Hran-ScnoonL MATHEMATICS, THE OTHER WITH THREE, GROUPED ACCORDING TO 
PERCENTILE RANKS ох THE COLLEGE APTITUDE ТЕвт 


. Units of high-school 
Class intervals mathematics a 
in P= cta aP 
ercenti ks 

percentile ranks Two Three 

(а? (а) 
91-100 18 10 .857143 3.571430 
81- 90 33 12 . 266666 3.199992 
71- 80 39 14 .264151 3.698114 
61- 70 43 3 .652087 1.956261 
51- 60 39 13 .250000 3.250000 
41- 50 51 3 .055550 0.166650 
31- 40 4T 12 .203390 2.440680 
21- 30 66 14 .175000 2.450000 
11- 20 68 8 .105263 0.842104 
0- 10 71 22 . 236559 4.204298 
17501) Iu) 1180420 25.779529 


P Zap 


ха = TSD [25.779529 — (111)(.18942)] 

= 30.96 ~P < .001;forn = 9, хои = 27.877 
877 the divergencies between the observed 
tions could have arisen by chance in less 
e value of P, corresponding to a value of 
divergence arising by chance is 


values of x? greater than 27 
frequencies in the two distribu 
than .001. We do not know th 


x* — 30.96, but the probability of such a ; 
less than 1 in 1000. We may conclude, therefore, that there is a sta- 


tistieally significant difference between the two distributions. The 

pedagogical conclusion is that groups presenting three units of high-school 

mathematics are superior on the whole on the College Ability Test to the 

groups presenting two units. 
It is possible to separate the contributions to x from each of the 

individual degrees of freedom, and so to test the distributions by parts. 
For 4 degrees of freedom the calculations for х? are 


Percentile ranks i 
on College Two units Three units еш И 
Aptitude Test 
: 22 73 .301370 
81-100 = 17 99 171717 
01- 80 82 18 106 1150943 
4 0 4% 18 139 "187050 
1- 40 n 30 169 .177515 
ы 175 111 586 .189420(Р) 
xi = 7.3488 


~P > .05 
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For 1 degree of freedom: 
E س‎ 


P.R.C.A.T. Two units Three units Total P 
Above 80 P.R. 51 22 73 .301370 
80 and below 424 89 513 .173490 

"Total 475 lll 586 .189420(Р) 
x$ = 6.8066 
Xii = 6.035 
Рух < .01 


The portion of the distribution contributing the most to the differ- 
ences is, accordingly, in the highest percentile ranks, or 81-100.7 

Problem V.17. To test the agreement between a theoretical and an 
observed distribution. One general methoil of testing a statistical 
hypothesis is to work out the results which would be expected theo- 
retically under the assumption that the hypothesis is true, and then to 
compare these with the observations. The chi-square test provides an 
efficient test of the goodness of fit. As an illustration we shall test the 
hypothesis that a set of data presented by Roberts et al. (Ref. 19) is 
described by a Poisson series. 


The data given in Table 30 were obtained in administering the Binet 
Test (a shortened form) to a group of children who passed all but one 


TABLE, 30 
ADDITIONAL Tests FAILED ON Downarp EXTENSION or THE Вімет SCALE TO ^ 
SAMPLE ОҒ 131 CHILDREN 
(After Roberts, Ref. 19) 


Observed Expected 

Number of ресе 

fate failed frequency, Bassi xi 

0 t 
0 88 87.41 0.004 
1 34 35.37 0.053 
H 7.16 
0.97 

4 0 0.10 0.070 
5 0 0.0i 

"Total 131 131.02 0.127 


* The theoretical distribution is obtained as follows: 


7 The reduction in the x?-values with coarser rouping of the data is noted. This 
result is to be expected with the reduction in the Es of degrees of freedom and the 
corresponding approach to the zero tail of the x*-distribution. 
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TABLE 30 (Continued) 


1. Calculate the mean number of tests failed: X = jy = 0.4046. 
2. Calculate the expected frequency. This is done by means of logarithms. 


Thus: 
Quantity Logarithm Expected frequency 
n = 131 2.11727 
€^ = g9106 (m log e) = 
(.4046)(.43429) = 0.17571 
n/en 1.94156 87.41 
m = 0,4016 9.60703 — 10 
mn/e 1.51859 35.37 
m 9.60703 — 10 i 
1.15562 
0.30103 
т?п /2en 0.85159 7.155 
m 9.60703 — 10 
0.46162 
0.47712 
m?n/(2)(3)e» 9.98150 — 10 0.965 
m 9.60703 — 10 
9.59153 — 10 
0.60206 
mn / (2) (3) (4)e” 8.98947 — 10 0.0976 
m 9.60703 — 10 
8.59650 — 10 
0.69897 
m5/(2) (3) (4) (5)е" 7.89753 — 10 0.007898 


> -- 2 
+ Chi-square is determined in the usual manner by calculating Aj i The 
ped because of small frequencies. This 
aleulating the theoretical values for the 
s were made to illustrate the method. 
ding probability value is .70 < P « 80. 
le mean has been used as the parameter 
ber has been used to calculate the 


classes from 2 through 5 have been grou 
Erouping could have been done without c 


classes beyond the third. The calculation 
хо = 0.127 with n = 1. The correspon 


here is 1 degree of freedom, since the samp 
of the Poisson distribution and the sample num 
theoretical frequencies. 


of a complete year of tests, then by extending the testing downward to 
determine how many of these pupils failed in one, two, or more tests. 
From the calculations it is noted that fora х? = 0.127 and with n = 1, 
the corresponding probability is between .70 and .80. Therefore, we 
may conclude that the Poisson distribution provides a good fit to this 


Set of data. 
PROBLEMS 
1. А `. distributions A and В from the Miller Anal- 
The following are two к ndom samples from the 


ogies Test. Determine whether they are Ta 
Same population. 
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A B 

8 58 43 77 53 79 48 66 50 65 75 48 

51 56 99 57 75 67 47 53 99 70 53 79 

80 88 76 8 48 86 67 75 76 69 75 6 

77 69 М 83 89 77 75 54 69 78 48 96 

56 72 71 27 48 90 76 75 76 67 89 84 


2. The following data are from an experiment comparing the relative 


efficacy of two different methods of teaching beginning high-school 
algebra. There are XXIV pairs of students, paired on the basis of 
chronological age and a pretest in arithmetic. There were two 
criteria of achievement: (1) scores on an inventory test, (2) scores 
on an achievement test. The values under "Exp." and “Оол.” 
refer to the experimental and control groups, respectively. Test the 
null hypothesis in this experiment. 


Dara ron PROBLEM 2 


Chron. Age Arith. Inventory Achievement 
Pairs = 

Exp. Con. Exp. Con. Exp. Con. Exp. Con. 
I 152 157 99 96 57 57 50 33 
II 172 157 87 87 61 67 32 20 
III 173 177 85.5 86.5 60 62 24 28 
IV 169 166 85 86.5 55 55 28 19 
у 160 156 85 86.5 50 50 33 23 
VI 168 162 82.5 82 50 50 31 28 
VII 171 169 96.5 97.5 56 57 36 28 
VIII 160 156 92 91 56 56 43 31 
IX 177 171 88 86 60 60 33 21 
x 165 161 85.5 86.5 57 56 33 21 
ЖІ 164 165 87.5 84.5 57 56 38 27 
XII 167 161 84.5 83 56 56 28 28 
хш 171 171 96.5 96 57 57 35 20 
XIV 168 169 99.5 99 50 51 42 24 
XV 175 177 83 80.5 56 56 29 27 
XVI 172 175 93 90 56 56 41 18 
XVII 169 170 81.5 79 56 57 35 20 
XVIII 161 167 90 87.5 56 56 36 26 
XIX 165 171 87.5 87.5 56 56 28 28 
XX 174 168 83 85.5 56 56 20 27 
XXI 176 175 93 94 51 50 29 29 
XXII 170 165 77 79.5 42 50 42 29 
XXIII 174 172 77 79.5 56 56 29 24 
ХХІУ 174 170 86 86 50 50 38 30 


3. (а) In Problem 2 determine the statistical significance of the differ- 


ences in achievement of the two groups by only considering the 


signs of the respective differences between the scores of individual 
pair members. 
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4, 


e 


. A check-up on the reading habits o 


(b) Compare the efficiency of the test used in (a) with that of the 
test used in Problem 2. 


Determine the significance of the difference of the percentage of 
those taking the second test, reaching or exceeding the median 
score of the group taking the pretest in the fall of 1935. 


2 


TLL 


Scores on Algebra Test 


20 waa Winter 1936 N, 278 
10 VA — Gen. College Pretest, Fall 1935 №2125 
м Жа 


о 20 30 40 50 60 70 80 90 !00 
Percentile Rank on Algebra Test 


For the following distribution caleulate (a) The variance from the 

grand mean; (b) the variance from the sample means. 

(c) Note the extent of agreement. 

(d) Why is it necessary to pay pr 
degrees of freedom? 


oper regard to the number of 


nr г 
"uu 1 П 20 21 15 24 28 29 


f seventh-grade pupils reveals 
that 55 per cent of the 558 voluntary readings of one жене ~~ 
of pupils was mystery and detective, where only 45 per cent б % A * 

voluntary readings of another random sample of pupils yas of this 
classification. Is there statistical evidence here that interest in the 
mystery and detective type of reading is higher in one sample than 


in the other? 
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7. In an attitude test administered to an experimental group of 796 
students and a control group of 861, Item 306 was answered correctly 
by 51 in the experimental group and by 47 in the control group. 
(a) Is there a statistically significant difference between the propor- 
tion of the experimental group that answered this item correctly and 
the proportion of the control group that answered it correctly? What 
is the statistical hypothesis tested? (b) In Item 35, 37 of the experi- 
mental group and 37 of the control group answered this item cor- 
rectly. Answer the above questions in regard to this item. 

- The following measures were obtained from an examination in per- 
sonal hygiene for a winter quarter class and for a spring quarter class. 
Determine the significance of the difference between means. May 
the variances be assumed equal? What hypothesis is under test? 
What is the most appropriate test of the hypothesis? 


со 


Winter quarter class: 
Mean = 20.56 
Sum of squares of deviations from the mean = 28,255 
^ Number — 675 
Spring quarter class: 
Mean = 22.07 
Sum of squares of deviations from the mean 
Number = 350 


І 


12,535 


9. Ina given situation n = 81, mean — 40, and standard deviation — 8. 
If we assume that the standard deviation of the inereased number of 
cases will remain approximately the same as given, what size of 
sample is necessary to reduce the standard error of the mean to .5? 

10. The following data indicate the frequency of intrapair differences in 


handedness in identical twins and in the handedness of their immedi- 
ate relatives: 


—— 


Identical twins 


R-R RL 

Without left-handed relatives 105 25 
With left-handed relatives 26 22 
Total 131 47 


== U Ñ 


Is the principle of classification effective? 


11. Following are two distributions of entering freshmen, the one having 
had no high-school work in foreign languages, the other having had 


two or more units in foreign languages. Test the independence of 
the two distributions as wholes and by parts. 
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Frequency subgroups, Units in high-school 


percentile ranks on foreign language 
College Aptitude Test None Two or more 
91-100 8 20 
81- 90 7 27 
71- 80 11 33 
61- 70 8 29 
) 51- 60 15 25 
41- 50 10 34 
31- 40 iut 33 
21- 30 35 38 
11- 20 24 50 
— da S — 
N, = 174 Уа = 389 


from four random samples of enter- 


12. The following data were obtained 
ude test: 


ing freshmen on a chemistry aptit 


" Entering Group 
N X 8 
1938 35 18.66 3.58 е 
1939 48 17.23 4.75 À 
1940 42 18.67 4.95 
1941 30 19.53 3.09 А 
5 18.39 4.33 


Total 155 à 


Test the homogeneity of the standard deviations. 

13. The following coefficients of correlation were reported between intelli- 
gence quotients (X) and chronological ages (Y) for two random 
samples of students in a course in elementary-school science. Test 
the significance of the difference between the two correlation 


coefficients: 
= —.507 


= —.455 


ts were obtained upon a random 
de of an elementary school: 


№ = 66 Tzy 


on coefficien 
he sixth gra 


14. The following correlati 


sample of 74 pupils in t 


Tuy = 59 rs = 815 Ty: = 44 
Where т = score on an initial achievement test 
y = mental-age score 
ement test. 


= score on & final achiev 


2 
Test the significance 0 
Test the significance of 
lation coefficients report 
correlation (see page 982). 

rey = 6704 


between ту; and ту. 
mong the following corre- 
ive problem in multiple 


f the difference 
the differences ™ 


15 
ed for the illustrat 
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22. 


. Neyman, J., and Pearson, E. S., 
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CHAPTER VI 
THE ESTIMATION OF POPULATION PARAMETERS 


The Problem of Estimation. The estimation of characteristics of a 
population, that is, the estimation of the parameter values of the popula- 
tion, is a fundamental statistical problem. In such a problem we usually 
begin with an assumption about or knowledge of the mathematical form 
of the population of which we presume to have a random sample. We do 
not have à knowledge of the values of one or more parameters in the 
mathematicalform. These values are required for the complete specifica- 
tion of the population. 

In general, there are a number of ways of estimating a parameter 
from sample data, some of which may be better than others. The theory 
of estimation provides a basis for investigating the conditions which an 
estimate should fulfill, for determining the best estimate to use under 
given circumstances, and for comparing the relative effectiveness of 
different estimates that might be used. 

In its most practical form, the problem of estimation is met with 
by the research worker in his attempt to reduce his original data to a few 
summary quantities which shall contain all the relevant information, 
that is, all information which is of use in estimating the values of the 
parameters. The problem of estimation is closely related to that of 
distribution, since both arise in the process of reducing data. From the 
logical standpoint, problems of distribution precede problems of estima- 
tion, since knowledge of the random distributions of various alternative 
statistics, derived from samples of a given size, is basic in the selection 
of the particular statistic most useful to calculate. А 

The problem of specification, or the specification of the mathematical 
form of the distribution of the hypothetical population from which a 
sample is assumed to have been drawn, completes the theoretical basis 
upon which depends the solution of the problems which arise in the reduc- 
tion of data. Although the three problems may be studied separately, evi- 
dently they are closely related in the development of statistical methods. 
Our purpose here is to study especially the problem of estimation. This 
is the problem of determining how observational data can be best com- 
bined to yield the most accurate estimates obtainable of the unknown 
parameters. Two procedures of estimation are considered: (1) estimation 
by a point and (2) estimation by an interval. . 

In order to judge whether one particular estimate or a group of esti- 
mates is better than others, criteria are needed. Three criteria have been 
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advanced: (1) consistency, (2) efficiency, and (3) sufficiency. Statistics 
which satisfy these criteria are known as optimum estimates or optimum 
statistics (Ref. 13). 


CHARACTERISTICS OF Соор ESTIMATES 

In order to be consistent, the value of a statistic must approach more 
and more closely the estimated parameter as the sample size is indefinitely 
increased. Such a value is a function of the observations, which con- 
verges stochastically to a population parameter as the sample number 
approaches infinity. An efficient estimate is one whose sampling distribu- 
tion tends to the normal law with the least possible standard error as the 
number of observations is increased. Efficiency requires that the vari- 
ance of the estimate (at least for large samples) should not exceed that 
of any other consistent statistic estimating the same parameter. The 
square of the ratio of the minimum standard error to the standard error 
of another estimate (also normally distributed in the limit) gives a 
measure of the relative efficiency of the second estimate. The criterion 
of sufficiency is satisfied by a statistic when no other statistic calculable 
from the same sample can supply any additional information regarding 
the parameter under estimation. A sufficient statistic is inevitably also 
100 per cent efficient, since it incorporates the whole of the information 
available in the sample in regard to a given parameter. 

The Measurement of Amount of Information. It is apparent that 
these criteria for judging the goodness of estimates require the knowledge 
of the amount of information that is available in any sample relevant to 
the population parameter under estimation. Fisher (1921, 1925) showed 
how to measure the quantity of information provided by the observa- 
tional data, relevant to the value of any particular unknown quantity. 
The mathematical quantity used to specify the amount of measurable 
information is the reciprocal of the variance, or the invariance, of the 
estimate. , 43 

The class of estimates which, as the sample is increased without limit, 
tend to be distributed about their limiting value (their mathematical 
expectation) in the normal distribution is the one appropriate to the 
theory of large samples. The amount of information afforded by an 
estimate normally distributed with variance V is 1/V, the invariance of 
that normal distribution. In the normal case, the variance decreases 
with increasing size of sample, n, always ultimately in inverse proportion 
to n. 

The criterion of efficiency, noted above, is that the limiting value of 
nV, where V is the variance of the estimate, shall be as small as possible. 
Fisher (Ref. 11) proved mathematically that the limiting value of 1/nV 
cannot exceed a quantity i, the amount of information provided by each 

ich is independent of the method of estima- 


observation the value of wh [ t 
tion. It was shown that the reciprocal of the variance, or the invariance 
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of the estimate, cannot exceed the amount of information in the sample. 
Thus: 


y Sm =I (6.01) 


This conclusion is dependent on proof that for certain estimates the limit- 
ing value of 


1. 
7 = (6.02) 


The Maximum Likelihood Estimate. The instrument supplied by 
Fisher for obtaining the estimates necessary for the limiting value (6.02) 
to hold is the method of maximum likelihood. By this method, estimates 
of the parameters are obtained which maximize the likelihood function 
and have the smallest limiting variance. The limiting value of the 
sampling variance of the maximum likelihood variance in large samples 
was proved to be 

ЖИР 
nV =: 


We may state here that the probability of occurrence of a sample is 
expressible as a function of the unknown parameters, and the likelihood 
is defined as a function of these parameters proportional to this probabil- 
ity. Thus, the method of maximum likelihood gives as estimates those 
values which maximize the probability that the totality of observations 
should be that observed if the hypothesis which specifies the parameters 
of the population sample is true. 

In large samples the maximum likelihood estimate has the smallest 
variance in comparison with any other statistie which is in the limit 
normally distributed. If the comparisons were restricted to statistics 
which in the limit are normally distributed, the utility of this method of 
estimation would be greatly limited. However, a stronger property 
than efficiency is possessed by the maximum likelihood estimate. This 
property exists when estimates may be made which contain within them- 
selves the whole of the information available for finite samples. This is 
the property of sufficiency. Where sufficient statistics exist, all the 
available information is contained in the maximum likelihood estimate. 
In random samples from a normal population, the mean and the standard 
deviation—the only two characteristics necessary to specify this popula- 
tion—are sufficient statistics, It is this fact that gives the great simplic- 
ity to the problems falling within the theory of errors. Thus, in much 
experimental work it is necessary to be concerned only with the precision 
of the sum, or mean, of the observational values and with the estimation 
of this precision from the sum of squares calculated from the data. These 
two quantities contain all the information provided by the data with 
respect to the mean and variance of the hypothetical normal model. In 
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cases where no sufficient statistic exists, Fisher has shown how the infor- 
mation in the sample may be recovered by using as ancillary the con- 
figuration of the sample. The configuration serves to indicate the 
precision of the estimate made, although it gives no information about 
the value of the parameter itself. 

In experiments where the variance of the population is not known, it 
must be estimated from the data. Such an estimate is itself subject to 
error. For this error, exact allowance is made in the distribution of t 
when we test the significance of the deviation of the observed value from a 
hypothetical value specified by hypothesis. In such cases it would be 
inexact to assume that the amount of information provided by the 
experimental results with respect to the true value under estimation 
would be given by 1/s?, the reciprocal of the sampling variance. In 
determining the absolute precision of the experimental result, not only 
the estimate, s?, derived from the data but also the number of degrees 
of freedom used in the estimate need to be taken into account. In this 
case it has been shown (Ref. 11, page 249) that the amount of information 
provided by an observed value, т, relative to the unknown mean popula- 
tion value, м is given by 
п + 1 (6.03) 


(n + 3)s? 


where n is the number of degrees of freedom. . 
Other Methods of Estimation. The most important general method 


of estimation so far discovered, at least from the theoretical standpoint, 
is the method of maximum likelihood. It will be frequently encountered 
in later discussions. There are other methods of estimation which should 
be considered. Under certain conditions all methods may yield similar 


results. hod of forming estimates of the parameters 
ral method о ormin a 
The oldest genera lues is the method of moments introduced 


of a distribution from sample va 
by Karl Pearson, in which sample moments are equated to the corre- 
7 


sponding moments of the distribution which are functions of the unknown 
Parameters. As many moments aS there are parameters requiring esti- 
mation are taken into account. The obtained equations with reference 
to the parameters are solved to give the estimates of the pono 
The fitting of the normal curve to a series of eem us ие 
the process of the method of moments. The crim M en 0 ten 
involve relatively simple calculations in ame Ке гуё : er 
decreases when the variations among the observations depart widely 


ш дану, of testing the closeness of an жетелей ^ pene of a 
minimum standard deviation of its sampling distri r с een. сов- 
Sidered. Likewise, the criterion of testing en э 0 of the 2 nt 
to certain parameters by a minimum x^value has been used. Both 
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these criteria are satisfied by the method of maximum likelihood in deal- 
ing with large samples. 

The original use of the x?-test by Pearson (Ref. 27) was in the case 
of a completely specified hypothetical distribution. In this case it was 
established that x?, under the assumption that the hypothesis is true, is 
distributed in repeated sampling in a x?-distribution with r — 1 degrees 
of freedom (r is the number of groups into which the sample values have 
been classified). Most often in practice, the hypothetical distribution 
contains one or more unknown parameters. In these cases certain 
modifications were necessary in finding the limiting distribution of x?. 
Fisher (Refs. 9 and 4) showed that, for certain important methods of 
estimation, the modification could be made by r 
degrees of freedom of the limiting distribution 
estimated parameter. 

The method of estimation yielding a minimum x? value is known as 
the x? minimum method of estimation. In practice, the method often 
leads to difficult solutions, so that certain modifications have resulted 
in what is known as the modified x? minimum method (Ref. 2, page 426). 
In certain cases this method is identical with the maximum likelihood 
method. In the case of fitting certain distributions, for example the 
binomial and Poisson distribution, and the normal distribution, the two 
methods give the same results. The method of maximum likelihood, 
however, can be extended to problems more general in nature. 

A method of estimation developed by Markoff (Refs. 21 and 26) is 
based on the principle of unbiased estimates. Markoff has shown in 
various cases how to construct linear forms in the observational data 
which give estimates of certain unknown parameters that have no bias 
and the variances of which have the smallest possible value. The process 
of obtaining the best unbiased estimate of the population variance, о?, is 
based on this principle, for example, s? = xD 
The procedures of estimation 


educing the number of 
of x? by one for each 


Point Estimation and Its Limitations. 
just discussed may be called estimation by a point. A single value is 
given as the “best” estimate of the true or population value. Such a 
procedure does not provide a, basis for specifying the degree of confidence 
one may place in such an estimate. It is known, of course, from sampling 


theory that the estimate made is not likely to be exactly equal to the 
population value. With large hom 


There are, of course 
needed, particularly for certain Subsequent statistical analyses. In the 


» many occasions when a single value estimate is - 


m ———— 
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case of interval estimation, the single estimate is wanted as material 
for a subsequent process of estimation. 

Estimation by Interval. We cannot tell from any sample estimate 
whether it is too great or too small. For this purpose further samples 
from the same population would be needed. It seems obvious therefore, 
that what is required is an interval of some kind which may be expected 
to include or cover the true population value in a specified number of 
cases. From the sample value and other ancillary information, we can 
calculate the point values of the upper and lower limits of the interval 

hat this interval will include or cover the 


and then proceed to state t 
population value. From sampling theory we can calculate the number of 


times in repeated sampling that the statement would be correct. Thus, 
the proportion of cases in which the statement may be assumed to 
be correct provides a measure of the confidence to be ascribed to our 
statement. 

Fiducial Limits. R. A. Fisher (Refs. 10 and 8) first introduced the 
method of estimation based on the concepts of fiducial probability and 
fiducial limits. The basic ideas underlying Fisher’s theory may be 
presented as follows. : : 

Observations in the experimental or observational sciences are con- 
crete and specific occurrences. They are now freely applied as a basis 
for probability statements about parameters whose exact values are 
unknown except for the information available in the observations. The 
kind of reasoning employed here comes from tests of significance, and the 
probability statements are designated as statements of fiducial probabil- 
ity, in order to distinguish such statements from those about “inverse 
probability." Fisher (Ref. 12) has indicated the fundamental random- 
variable relation which connects sample and population. The essential 
step in establishing this relation is in the following assertion: Irrespective 
of the character of the sample, the probability that the population param- 
eter shall fall in any range is derived from the known probability, P, 
which is defined as the function of the variable, and from the test or the 
pivotal quantity in the test of significance. The assertion requires only 
that the unknown parameter value shall fall in the range corresponding 
to these known quantities. In this sense is to be interpreted the some- 
What paradoxical statement that a ro with known characteristics is a 
ri known population. ә 

“о керг” statistics are derived from observations 
which are defined as random variables involving parameters upon which 
their distribution functions are dependent. These properties are used 
to establish the connections between the probability distribution of the 
random variable and the distribution of the statistic used as the pivotal 


Sees mae The statistic used as the pivotal 
quantity in the test of significance. : : 

қ бізге is functionally independent of the population from which the 
sample is drawn. This connection, once established, gives meaning to the 
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practical situation where the statisties are observable but the parameters 
are unknown. 

An illustration (Ref. 12) serves to show the application of this process 
of reasoning or form of fiducial argument. Following it, one may go 
from forms of statements embodying observations as random variables 
to forms of statements embodying observations as fixed data. In the 
former, the distribution functions include certain fixed but unknown 
parameters; in the latter, the frequency distributions are derived for the 
unknown parameters considered as random variables. 

Let £ be the median of a distribution concerning which the only thing 
known is that its probability integral is continuous. Take the case 
where n = 2, that is, where X; and X» are two observational values of the 
variable X. For any given value of £, the facts are that the three 
probabilities—that X; and X; (a) should both exceed the median, (b) 
Should lie on either side of it, (c) should both be less than it—must occur 
in the frequency ratio 1:2:1. If r stands for the number of observations 
less than the median, then r becomes a pivotal quantity involving both 
the unknown parameter and the observations with a sampling distribu- 
tion independent of the parameter; that is, r takes the values 0, 1, and 2 
with probabilities +, 3, and i, respectively. This leads to the fiducial 
argument from the two given observations, now considered as fixed 
parameters, that the probability is .25 that £is less than both X; and Xs; 
.90 that ¢ lies between X, and X»; and .25 that £ exceeds both X, and 
X» This reasoning thus leads to a frequency distribution of £ now 
considered as a random variable. 

For a sample of any size, n, the following quantity expresses the 
probability that the median shall exceed r of the observations and be less 
than n — r: 

n! 
rn 912" 

Confidence Intervals. The complete theory underlying the method 
of interval estimation developed by Neyman (Ref. 25) cannot be presented 
here. However, the definition and use of the two concepts of confidence 
intervals and of confidence coefficients are presented briefly. 

Consider a sample of т random variables X, Х;,..., Xn, the n 
observational values. Denote by E the set of values of the X variables. 
This set can be represented by a point, called the sample point E in an 
n-dimensional space, the rectangular coordinates of Æ being Xi, Xs, . . . ; 
Xn. Assume that the probability law of the sample Xi, X», . . . , Xn 
though known, is given in terms of two parameters 0; and 6», which are 
unknown. It is desired to make an estimate of one of the parameters, 
say 01. 

The process of estimating 0, consists in constructing two functions 
of the observations, (E) and ЖЕ) and in estimating the parameter to 


(6.04) 
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be within the interval: 6(Z) = [@(Z), 8(E)]. It is important to point out 
certain properties of the functions 0 and 6. Since they are functions 
of the sample values, Xi, Xs», . . . , Xn, they are both random variables 
and will vary from sample to sample as the sample point Xı, X», . . . , X, 
varies. Since they are random variables, the probabilities of 0 and Flying 
within or without any specified limit may be considered. 

Denote by 6? the true value of the parameter 0, in a particular problem. 
Then @(Z) and 8(E) should have this property: the probability that when 
6° and 6» are the true values of the two parameters, 0(Е) is less than 02 
and Ẹ(E) is greater than 6} and is equal to a; that is, 


P[g(E) < 0 < (E), 03] = а (6.05) 


The interval extending from 9(Z) to 8(E) in (6.05) is called the con- 
fidence interval corresponding to the sample point Е, and the value a 
(for example, 0.95, or 0.99 . . .), the confidence coefficient. What is 
required in (6.05) is a probability of a specified value, whatever the values 
of 0, and 62, calculable from the probability law depending on 0, and 0». 
Thus the functions 0 and 0 must satisfy (6.05), also identically for all 


possible values of 6». 

The meaning of the confi 
that a large number of sam 
obeying the specified element: 


dence interval may be said to be this: Assume 
ples are drawn randomly from a population 
ary probability law. If in each case the 
statement is made that 0 is included in the interval [0 (E), (ЕД, then the 
relative frequency of correct statements will be approximately equal 
to the confidence coefficient, a. For example, take a = 0.95. If 100 
samples are taken and 100 confidence intervals set up, it may be expected 
that 95 per cent of these intervals will include or cover the true value, 
say 09. It should be noted that this statement is not equivalent to the 
statement that the probability is 95 out of 100 that 6) lies between the 
limits g and б. This discrepancy is explained by the fact that 0 is not a 
m m constant. Consequently, the probabil- 


random variable but an unknown ¢ . | 
ity of 0% falling within specified limits may be either zero or unity, depend- 


ing on whether the actual value of 0: falls without or within the limits. 
Further development of the theory (Ref. 24) indicates that there 
exists an infinite number of confidence intervals for a given confidence 
coefficient. Hence, some principle is needed as a basis for choosing from 
among them. One principle is to select the shortest system of intervals. 
Shortest confidence intervals, however, exist to à considerable extent only 
in exceptional eases. Other principles, such as unbiasedness, have been 
used; but even shortest unbiased confidence intervals exist in only à 
restricted class of cases. A third type of interval has been called the 
“short-unbiased” confidence interval. If there is more than one param- 
eter, there is not often a confidence interval for «us of the parameters 
which is independent of the other parameters. With more than one 


112 ESTIMATION OF POPULATION PARAMETERS  [Cuar. VI 


parameter the set of points constitutes a simple close region, if it exists, 
rather than a single interval as in the case of only one parameter. In 
the case of several parameters, new problems arise. But the description 
of the basic ideas has been given in the situation described above. 
Fiducial versus Confidence Intervals. It appears that Fisher’s theory 
of fiducial probability and Neyman’s theory of confidence intervals are 
closely related and that in a number of practical cases they may lead to 
the same form of procedure. The authors, however, indicate a disagree- 
ment in the logical foundations as well in certain practical applications. 
Neyman (Ref. 23) has attempted to develop a general procedure which 
will supply rules for setting up from observational data an interval 
that will cover the unknown parameter with a given probability. Fisher 
(Ref. 7) indicates that a unique probability measure associated with a 
particular interval is needed. This measure is defined as a fiducial 
probability. An essential point of agreement is in the interpretation 
that the probability of, say, 0.95 is not the probability that the parameter 
estimated lies between any fixed limits but, rather, that a variable state- 
ment about this parameter formulated in accordance with a specified rule 
will be correct. Fisher expresses it by stating that there is a fiducial 
probability of 95 per cent of the unknown parameter’s lying within the 


Specified fiducial limits. According to Neyman, the statement would 


_ be made that the specified interval will cover the true value and that we 


.. 


know that the statement will be correct 95 times out of 100. 

Fisher (1935) has emphasized that a fiducial statement can be made 
only in terms of the estimate if the estimate of the unknown parameter 
has the property of sufficiency, because only in this case does the estimate 


elicit the whole of the available information. Neyman’s confidence | 


intervals are apparently of more general applicability. When an estimate 
is sufficient, both the fiducial limits and the limits of Neyman's shortest 
confidence interval or of his short unbiased confidence interval depend 
on this property of sufficiency. The interval would not, however, always 
be the same in the two cases because of the use by Neyman of an addi- 
tional principle in the determination of his intervals. 

It would appear, however, that the two procedures would be inter- 
‘changeable in at least the first two examples that follow. 


PROBLEMS or INTERVAL ESTIMATION 


Problem VI.1. Estimation of the population mean. The first prob- 
lem consists in estimating и, the mean of a normal population of known 
variance c?, given a sample mean X based on n items. 

From our study of sampling theory, we know that the means of 
random samples from a normal population, for example, the X's, are 
normally distributed about » with a standard deviation (called the 
standard error of the mean, сх) equal to ¢/\/n. Hence, we know the 
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proportion of sample means which will lie within the interval: » + some 
multiple of сх. The confidence interval may be written as 


Zyl Lus + ےل‎ 6.06 
е SES бй (6.06) 
where ya is the value of ‘ 
УХ — ») 
с 


fidence coefficient а, which can be read from a normal 
Шо = 0.99, then Ya = 2.576. If a = 0.95, 
For example, we find that 99 per 


for a given con 
probability integral table. 
then y, = 1.96 no matter what n is. 1 
cent of the sample means will fall within the interval и + 2.576ох, and 
95 per cent within -the interval и + 1.966х. On the basis of sampling 
theory, if in repeated sampling we take the interval extending from a 
lower limit of X — 2.576cx to an upper limit of X + 2.576cx, then this 
interval will cover the population mean, и, in 99 per cent of cases. . 

We may take as a practical illustration the 100 samples of 5 items 
each drawn from the population with p = 30, с = 10. For samples of 


5 numbers cx will be io 
کے‎ = = 4.472 
Vn М5 
ficient, we take the intervals extending 


Usi fidence coe: 
sing a 95 per cent confidence с (1.96) (4.472), or from X — 8.77 to 


from X — (1.96)(4.472) to X + 

TABLE Бі - Se " - 

Coxripz p Means or 100 RANDOM AMPLES OF , Usne 

= a 77 CoEFFICIENT OF 95 PER CENT 
(Population p = 30; с = 10) 
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Х + 8.77. We calculated these intervals for the 100 sample means 
given in Table 3, page 34. They are recorded in Table 31. It is 
noted that only one of the 100 intervals calculated, namely, 10.63 — 28.17, 
does not include the population mean 30. 

The sampling experiment was repeated by taking random samples of 
size 50 instead of 5. "The means of 100 samples of 50 items each were 
caleulated. Again, the intervals were set up by using a confidence 
coefficient of 95 per cent, which in this case extended from X — (1.96) 
(1.4142) to X + (1.96) (1.4142), or from X — 2.77 to X + 2.77. We 
found that the population mean 30 was covered in 97 of the 100 cases. 

An extension of the sampling experiment was made to obtain the 
means of 100 samples of 100 items each. The confidence intervals with a 
confidence coefficient of 95 per cent were calculated again, given by the 
limits ¥ — 1.96 and X + 1.96. We noted that the population mean 30 
was covered in 96 of the 100 cases. 

In all three of the sampling experiments, therefore, there was a close 
agreement between theory and observation. We noted also that the 
confidence intervals become shorter as the size of the sample is increased. 
"Therefore, the larger the sample, the more accurately can the true or 
population value be estimated. 

Problem VI.2. Estimation of the population mean of a normal popu- 
lation of unknown variance. Near always, in experimental work, 
neither the mean nor standard deviation of the population from which 
we are sampling is known. In estimating the population mean in such 
cases, we have to use the mean and standard deviation of the sample 
and the distribution of , We shall calculate the fiducial values according 
to Fisher (Ref. 5, pp. 195-198). 

A fundamental principle in the use of the é-distribution for the solu- 
tion of this problem is: If an estimate of a parameter is normally dis- 
tributed with a variance which can be estimated from the sample and the 
distribution of which is independent of the estimate of the parameter, then 
fiducial limits can be calculated from “Student’s” ratio. 

The following are the characteristics of i which give it its unique 
utility for the solution of this type of problem: 


(a) The distribution of ¢ is known with exactitude, without any sup- 
plementary assumptions or approximations. 

(b) tis given by the single unknown parameter, и, and by observable 
statistics only. 

(c) The statisties involved in the quantity ¢ are sufficient. 

The quantity ¢ is expressed by: 


p= AS vig- (6.07) 
Мз 
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where 8 =4 үз — X» 
n= l 


It is noted that since all terms on the right-hand side of (6.07) except u 
are observable, the fiducial values of и are determinable when values of t 
appropriate to any chosen level of significance, e, have been chosen. 
Furthermore, and s? are independently distributed; and the two 
quantities, the sum and the sum of squares, calculated from the data 
are sufficient statistics, since they contain all the relevant information 
concerning the mean and variance of the hypothetical normal curve. 


Therefore, we may write 
2 ЕН 
end dk БЕ 
к= Ў +t К> (6.08) 


as the corresponding fiducial limits for the value of u. With respect to y, 
it may then be said that the fiducial probability is (1 — e) that it will lie 
within these fiducial limits. 

As a practical illustration, we may set up the fiducial limits of the true 
mean difference based on the data from the controlled experiment given 
in Problem V.6, page 75, in which the null hypothesis was rejected at the 
5 per cent level. 

The following quantities were obtained: 

X = 9.28 
n = 25 
z(X — X) = 6809.04 | 
z(X — X) _ 6809.04 _ 
a —1 A = 283.71 


5° = 


We wish to set up fiducial limits with a fiducial probability of 95. 
Accordingly, 


te = tos 
+tos = 2.064 (for n — 1 = 24) 
and 
= 52 
= Хт 4 
= 9.28 + (2.064)(3.368) (6-09) 
= 2.33 or 16.23 


on mean и, it may be said that it has a 
ent of being less than 2.33 or of being 
a probability of 95 per cent of 


With respect to the populati 
fiducial probability of 2.5 per ¢ 
greater than 16.23, and, in the same sense, 
lying within these fiducial limits. 

Problem VI.3. Estimation of the population variance from the sam- 

terested in determining whether the 


ple value. If the experimenter is in etermin 
variance or the standard deviation of a normal distribution could exceed 
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a given value or could lie in a given range, a test of significance is needed 
for which the pivotal quantity should possess the following characteristics: 


(a) Its exact sampling distribution must be known. 

(b) It must be expressible in terms of the unknown variance, ¢, of the 
distribution sampled, together with known statistics only. 

(c) The statistics involved in the expression of the quantity must be 


sufficient. 
< (n—1)s?. „.,. ЕТЕ” 
It is known that = is distributed as is x? for n — 1 degrees 
of freedom. That is, if = x i is the ratio of the estimate of the population 


variance as obtained from the sample for n — 1 degrees of freedom to the 
true variance, ¢, then x? is distributed, independently of the population 
mean and variance, in a distribution determinable from the number of 
degrees of freedom (Ref. 6). 

The upper and lower hundred е per cent fiducial limits of ф can be 
obtained from tables of the x?-distribution. If the two critical values of 
х? are represented by xj and x3, the fiducial range of ¢ will be the interval 


[5 =e (n = 2e] 


x xa 


As our practical illustration, we set up the fiducial limits of the 
variance of the distribution based on the data in Problem V.6. If we 
take e — .05 as the probable lower limit of the value of ¢forn — 1 = 24, 
х? is less than 36.415 in only 5 per cent of trials (see Table III, Appendix). 

Substituting this value of x? in the equation 


у = 2 - 

Ф (6.10) 

6809.04 
$ 


We have — 6809.04 
36.415 


= 186.98 


Similarly, the probable upper limit to the value of ¢ is obtainable by 
first noting that x? for n — 1 degrees of freedom exceeds the value 13.848 
in only 5 per cent of trials. Substituting this value for x? in Equation 


(6.10), 
ail 6809.04 
ф 
Р _ 6809.04 
we get, ф= 18548 


І 


491.70 
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We may say, then, that the fiducial probability is 5 per cent that the 
variance should exceed 491.7 or be less than 186.98, and, in the same sense, 
a fiducial probability of 90 per cent of the variance lying within these 
fiducial limits. If a linear measure of variation is wanted, the correspond- 
ing fiducial limits for the population standard deviation are 22.17 and 
13.64, respectively. 

Problem VI.4. Estimation of an individual's true score from his ob- 
tained score on a test. We assume that the scores an individual would 
obtain on a very large number of equivalent tests are distributed in a 
normal manner about his true score with a standard deviation equal to the 
standard error of an individual score, es = s V1 — r, where s is the 
standard deviation of the distribution of scores and r is the reliability 


coefficient of the test. The upper and lower limits of the confidence 


interval of his true score, £, are given by 


X + Y.(s V1 — 7) (6.11) 


where Y, is the value of y = (X — $) /ozfor a given confidence coefficient, 
a, which is read from the normal probability integral table; X is the 
obtained score; and о is the standard error of X. | 

As an illustration, let us set up the confidence interval for the true 
score of a pupil who receives an 1.Q. rating of 105 on a particular intelli- 
gence test on which the standard error of an individual score is 4 I.Q. 


points. Using a confidence coefficient of 98 per cent, the upper and 
lower limits of the confidence interval are, respectively, 105 + (2.326) (4) 
— 114.3 and 105 — (2.326)(4) — 95.7. We then state that the interval 
(95.7, 114.3) will cover the true І.О. score of this individual, and we know 
that our statement concerning the true score, ¢, will be correct in 98 per 


l cis confidence interval for the popula- 


imation of the 
.... tinuous population. We һауе con- 


tion ian in samples from any con s ati 
E the suming distributions of certain statistics calculated from 


i :neg only one of the unknown parameters specifying 
nde empleo Fa The method of interval estimation 
Was used сони up іп terms of the observations at any level the confidence 


inter г с ulation parameter. | 
ie ete rs] rr aem (Ref. 29) independently obtained the 


i i ference to the form of parent 
i inter he median without геге! ) 
M at vius in which the population form is unknown or, as 
Lis mem 1 s in which it i est an assumption a mur 
mall samples, 1n W: ^ қ f the median as а measure о ocation 
ity. interval estimation of th 
"d бшсе риге Nair (Ref. 22) used the «oe of p 
restricted "4 continuous populations, to construct a table o confidence 
intervals for the median, the use of which makes the problem of estimation 
; 


extremely simple. 
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In a random sample of n observations Xi, Xs, ..., Xi, ..., X. 
arranged in ascending order of magnitude, if P; is the probability integral 
of X+, then 


PX«X)-P(P«P)- i P0 — P)dp = (п — k + 1,k) 
(6.12) 


where Тх(Р,4) is the function tabulated in the Incomplete Beta Function 
Table. By definition, the probability integral corresponding to the 
median, M, is $. Therefore, 


P(M < Xj) = P($ < Py) = Ios(n — k + 1k) 
Also, PM < Xi) 2 PM > X, a3) 
Hence, Р(Х, < M < X.) = 1 — 2Ios(n — k + 1, k) (6.13) 


which is the confidence interval of the population median. It states that 
the unknown population median will lie in the interval extending from the 
kth to the (n — k + 1)th observation in 100 [1 — 27os(n — k +1, 01 
per cent of the cases. 

With the aid of the Incomplete Beta Function Tables, Nair (Ref. 22) 
prepared the Table of Confidence Intervals for the Median for values of n 
from 6 through 81, for confidence coefficients of 0.95 and 0.99. This 
table consists in finding k such that, given n, 


Ios(n — k + 1, k) = 0.025 or 0.005 


Since k can have only integral values, the confidence coefficient cannot 
be fixed exactly at 0.95 or 0.99 for all values of n. Values of k are taken, 
which bring the confidence coefficient 

I — 2Ios(n — k + 1, k) 


nearest to (and greater than) the conventional values of 0.95 or 0.99. 
For values of n larger than 81, Nair (Ref. 22) suggests the use of the 
normal curve as an approximation where т, the relative deviate, is given 


by 
ӨШ. n — 2k (6.14) 


For a given confidence coefficient, such as 0.95 or 0.99, the correspond- 
ing value of z can be obtained from the Normal table, and the value of k 
can be determined from the relation 


ea 2® (6.15) 
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As an illustration of the use of Nair's Tables, we shall set up the con- 
fidence interval for the population median, M, using the sample data 
given in Problem V.6, page 75. The sample median, Md, of the indi- 
vidual pair differences is 10; n = 25. 

We enter Nair’s Table given in (Ref. 22) with n = 25. The argu- 
ments and values for n = 25 given in the table are as follows: 


Confidence coefficient 2 0.95 Confidence coefficient = 0.99 


Я 00% —k+1,4) |k nekt lo(n—k-cl1k т 
oa” 7 ый Шш: 020 — 2 


ا 
For a sample of 25 observations, we can say that, with a confidence‏ 
coefficient of 95.68 per cent [100 — 2(.0216)], the population median M‏ 
will lie between the eighth and eighteenth ranked observations, that is,‏ 
between 17 and 3. We сап also say that, with a confidence coefficient of‏ 
per cent, the population median, M, will lie between the sixth and‏ 99.6 
twenticth ranked observation, that is, between 24 and —6. p‏ 
Problem VI.6. Setting up the confidence interval on a population‏ 
difference from a given sample difference in percentages. if percent-‏ 
ages are obtained within a sample, that is, the percentages of “yes” and‏ 


“no” eet iven question, the problem arises of how to get 
жойы ыи = opin difference, d, for a given sample differ- 
ence, d. Wilks (Ref. 32) gives the 99 per cent sampling limits of d as 
d + 2:58 VOOR: + Pa) = d (6.16) 

"ovn 

dly random samples of size n from a population 
» percentages аге Р; and Р», respectively, 


les have а difference d which lies 
approximately 99 per cent of the samp price сі 
Enid. ағы In practice, sample te d, Pa and P; ps 
Substituted for the unavailable d, Pi, and Ps. his procedure may be 
satisiatany fon pretties! IT imple, conservative critical 
ilks gi uantity 258/77 as a simple, | 
us Cs m ference d. If +d is larger than 258/v/n, the 
н уа that d, the population difference, would be 
, 


probability is at least 0.99 limits. The more common 
incl " o positive confidence i lee 
eiae betw een ad es fid at the 1 per cent level of аалып ш pem 
dif pre pem sn the “yes” and “по” percentages exists in the 

erence etwe 
population. ienifi f the difference 

+ the significance of the 
s ive problem, let us tes З 
"Е 5-4 ee ae а random sample of 77 E p 
x m | e tw v pes 92 of whom remained in the teaching profession 
eachers’ c , 


That is, in drawing repeate 
in which the “yes” and “по 
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and 55 of whom left it within 10 years after graduation. The approxi- 
mate test of significance given by the pivotal quantity 258/+/n shows 
that d, or 42.8 per cent, > 258/\/77 > 29.4 and hence significant at the 
1 per cent level. The 99 per cent confidence interval is given by sub- 
stituting the sample values for d, Рі, and Ps in (6.16). Thus: 


2.58 
+ —= )) — Es + j 
d+ Vii A/100(71.4 + 28.6) — (42.8)? = 42.8 + 26.5 
or the confidence interval of the population difference, d, with a confidence 
coefficient of 99 per cent is (16.3, 69.3). 

Problem VI.7. Setting up a confidence interval of a population dif- 
ference from the difference between two sample percentages. The 
problem of comparing two percentages in different random samples differs 
from that in Problem VI.6 in that in the latter there is a negative correla- 
tion between the percentages of “yes” and “по” answers. No correla- 
tion exists in the percentages in the two different samples. Wilks (Ref. 
32) gives 99 per cent sampling limits of d as 


d + 2.58 Ss 4 P2100 — Р») (6.17) 
1 


na 


and the corresponding conservative critical limit for d as 


129 [би + ты) (6.18) 
Nine 


If instead of calculating d, the difference between the percentages Рі 


and P», we first transform the percentages to the inverse sine function 
(see page 164), then 


d' = 100 С ini = sin! АУ (6.19) 


Then 129 4 LEX is, to a close approximation at the 1 per cent level, 


an exact critical limit of d'. 
As an illustration, we shall set up the 99 per cent confidence interval 
for the population difference from the two samples of percentages of 


color-blindness in the two sexes of the Caucasoid population (see Problem 
V.8, page 80). 
From the data, 


P,-84, Р,=13, d-71 
The 99 per cent confidence interval is obtained by substituting the sample 
values P;, P», and d in (6.17): 


(8.4)(91.6) , (1.3)(987) _ 
а + 2584] 793 + 232 = 7.1 + 3.18 
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Therefore, the 99 per cent confidence interval of the population difference 


is (3.9, 10.3). 
Again using Formula (6.19), we get! 


4 = 100(sin-! 4/.084 — sin! 4/.013) 
100(16.8 — 6.4) — 1040 


(в FE Pê | 1025 — 
and 129 a 129 183,976 ^ 9.6 


Since 1040 > 9.5, the probability is at least 0.99 that a genuine difference 
between the percentages exists in the population. 
Problem VI.8. Setting up the confidence limits for an individual 


estimated score. In problems of estimating or predicting a measure of a 


characteristic from a knowledge of one or more other characteristics, the 
predicted values are subject to error. Here we can use the confidence 
interval to show the accuracy of individual estimates and the confidence 
that may be placed in the statements made about individuals. We 
shall take the case of simple regression, that is, the prediction of one 
characteristic from a knowledge of another.? The data are from Problem 
V.13, page 88, and we shall set up the confidence interval for each of the 
individual’s estimated score from the regression equation, using a con- 
fidence coefficient of 98 per cent. The basic calculations are given in 


Table 32. 
The standard error of 6 
Xo, is given by 


he estimate Үр fora particular value of X, say 


ss — 79 (Xo — zy} 
Sy. = | NI һ LE (6.20) 


d error of Үр, N is the number of pairs of 
ntities have their customary meanings. 
From the formula, it is noted that the errors of the estimates of Y 
increase as the quantity Xo departs from the mean of the X-distribution; 
also that as the values of r and sx become larger, the smaller become the 


errors of estimation, other factors being equal. 
From Problem V.13, we record the following values: 


where sy, denotes the standar 
observations, and the other qua! 


y, = .9873Х — 0.68 
si, = 157.30 

s} = 172.59 

= 08885 
X= 53.24 

N= 25 


56 of Ref. 13 


1 Transformation obtained from Fisher and Yates’s Table XII, page 
in Chapter VII. 


? For the multivariate case see page 343. 
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TABLE 32 
STANDARD Errors or ESTIMATED VALUES OF Y ron DIFFERENT VALUES OF Xo WITH 
CORRESPONDING 98 PER CENT CONFIDENCE INTERVALS 


Independent variables 
arranged in descending 
5 itud 
Indi- жы) Y | Ys | 5 | 65. | Interval GEE O EROS 
vidual 

Syg ЖҮ Үн 
а) | (2)) (3) | (4 (9) (10) 
1 46 | 52 | 44.74 | 1 2 1.707 4.27 
2 38 | 38 | 36.84 | 1 3 1.707 4.27 
3 64 | 63 | 62.51 | 1 3 1.357 3.39 
4 73 | 65 | 71.39 | 1 4 1.305 3.26 
5 61 | 58 | 59.55 | 1 2 1.305 3.26 
6 34 | 33 | 32.89 | 1 4 1.205 3.01 
7 57 | 49 | 55.60 | 0 2 1.205 3.01 
8 66 | 63 | 64.48 | 1 3 1.075 2.69 
9 25 | 24 | 24.00 | 2 5 1.075 2.69 
10 30 | 26 | 28.94 | 1 4 1.006 2.51 
11 45 | 33 | 43.75 | 1. 2 0.955 2.39 
12 78 | 71 | 71.39 | 1. 4 0.923 2.31 
13 45 | 43 | 43.75 | 1. 2 0.923 2.31 
14 55 | 63 | 53.62 | 0. 2 0.919 2.30 
15 66 | 70 | 64.48 | 1. 3 0.929 2.32 
16 49 | 46 | 47.70 | 0 2 0.944 2.36 
17 64 | 65 | 62.51 | 1 3 0.965 2.41 
18 | 45 | 46 | 43.75 | 1 2 1.056 | 2.64 
19 61 | 62 | 59.55 | 1 2 1.094 2.73 
20 | 52 | 46 | 50.66 | 0 2 1.094 | 2.73 
21 67 | 68 | 65.47 | 1. 3 1.094 2.73 
22 59 | 53 | 57.57 | 1. 2 1.439 3.60 
23 55 | 55 | 53.02 | 0. 2 1.675 4.19 
24 51 | 52 | 49.67 | 0. 2 1.926 4.81 
25 50 | 48 | 48.68 | 0. 2 2.255 5.64 


Substituting these values in Equation (6.20) and using the Xo for 
each of the 25 individuals, we obtain the Sy, for each individual. These 
values are recorded in column (5), Table 32. Using the confidence 
coefficient of 98 per cent, we find from the t-table that the value of бо? 
for n = N — 2 = 23 is 2.5. Therefore, for any particular value of Xo 
the upper and lower limits of the confidence interval will be У» + 2.55vz 
апа Үк — 2.5sy,. Тһе values of 2.5sy, are given in column (6), and the 
values for the confidence interval, in column (7). 

In column (10) the values of t.oosy, have been recorded for values of 
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X, [column (8)] arranged in descending order of magnitude. It is clear 
from column (9) that the errors of estimation inerease considerably as 
the value of Xo recedes from the mean of the distribution of X. Cor- 
respondingly, the confidence intervals widen and reflect the increase 
in the errors of estimation. 


CONFIDENCE LIMITS AND TOLERANCE Limits 


A distinction should be made between confidence limits and tolerance 
limits. The latter has proved to be a useful statistical concept in the 
quality control of manufacturing products and probably can be applied to 
other fields. 

The problem in setting up toleran 
limits from the sample information whi 
specified proportion of the universe or pop 


ce limits is that of determining 
ch will include, on the average, a 
ulation between them (Ref. 33). 


= (X; — X) 


For an example, let X, be the sample mean and s* — =D 


the sample variance estimate in а sample of size n. The tolerance 
limits L/ and Li, which between them will include, on the average, à 
proportion a of the universe, are given by 

N Ri (6.21) 


oS 
X + taal > 


can be obtained from the table of the ¢-distribution, 
been specified, for example as 99 per cent, 95 per 


the number of degrees of freedom. In 
X + tas may be said to include the 


dence coefficient of a. 


The value of te 
When the value of a has р 
cent, or whatever, and n ~ lis 
contrast, the confidence limits. 
population mean, и, With a confi 

Tn. METHOD OF MAXIMUM LIKELIHOOD 

We shall illustrate the method of maximum eru for oer pis 
the i opulation values by applying the methoc to the 
best estimates of the рор ameters required to specify 


derivation of the estimates of the five par я 
a normal correlation surface (Refs. 28 and 19). The five parameters 


are the means of the two normal distributions of the variates X and Y, 


7 Y rations, сх and e»; and p. 
Sa; е «ely; the two standard devia 7 ; and p, 
the pani Б, respon қ Tt is assumed that the regression 


both ways (X on Y or Y on X) is linear and that the variables X and Y 


M i en Y are normally distributed, then the probability 
distributions of X, Y, and XY, namely, P(X), P(Y), and P(X,Y), are 
| Qo (6.22) 
20x s 
ораи 


ex 


X an 
)is 
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i Z- 
P(Y) = mae” 28 (6.23) 
1 00): 20070 -9 , ar- 9: 
P(X,Y) = е 21-р) оу? axer oy? (6.24) 


2roxor V1 — p? 


With N pairs of values, the simultaneous probability distribution of all 
the N values of X and Y is 


P(X, 2 Xs, E Я Te s vr-3) е 
[e =i _» R E = 24 (6.25) 
сү 


сё. oxoy 


To obtain the maximum likelihood estimates, the process consists in 
taking the partial derivatives of the probability functions with respect 
to the parameters, setting the resulting equations equal to zero, and 
then solving the simultaneous linear equations for the parameters. 
Here, then, the procedure consists in taking the partial derivatives of 
P(Xy ... , Xw, Ya, . . . , Ук) with respect to p, Ё ox, cv, and p. It is 
convenient to work with the naturallogarithms. Hence, for (6.25) we 
have 


log,P = —N log, 2r — Nlog,ex — N log.o+ — N log. (1 — р?) 
6.26) 
sd [rs ee ona ‹ 
2(1 — p?) А 


oxoy с? 


Then log Р is differentiated with respect to и and the equation is set equal 
to zero, giving 


flog? ы 2 2X) 22 3) (т) 
ie = ру 3 Spa] aues 
From which сүХ(Х — y) = pexE(Y — $) (6.28) 


Likewise, differentiating log P with respect to ¢, setting the derivative 
equal to zero, and reducing, we get 


ox2(Y — )غ‎ = poy Z(X — и) (6.29) 


Assuming р = 1, ox #0, су > 0, we get by solving equations (6.28) and 
(6.29) the optimum estimates: 


УХ 2 
eae ы 6.30) 
ш ү = x ( 
zY 5 
S2 es 6.31) 
вер = Ў ( 


Similarly, we may differentiate log р partially with respect to ox, бу) 
and p, respectively; set the equations equal to zero; solve; and substitute 
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the values given for и and £ in (6.30) and (6.31); obtaining 


ЖӨНІ . (2X)? 
c= | nis (6.32) 


T": 115 207) = зу (6.33) 


р (3X)(2Y) 
ху ON OC 
=r 


p= 
2. (2X)? (ZY)? 
"IDE E a уз | (6.34) 
5 T NzXY - (2X)(ZY) E 
VIN 2X? — CAIN 2Y? — (SP) 
Jackson and Ferguson (Ref. 19) have shown that the maximum likeli- 
hood estimate of p in the case of samples from a population specified by 
four parameters—ox = ey = 9; K; E; p—is 


2 [» xY- amn] 


a GD 


This is the case in determining the reliability coefficient of a test by the 


test-retest and alternative equivalent forms methods. 
In the same way, the maximum likelihood estimate of p obtained from 


samples of a population specified by three parameters—ex = оу = о; 


(6.35) 


и = Ej p—is 
_ (2X + 2Y} 
2 ) ХҮ — - 


р = 
T үз (ХХ + ХҮ)? 
х? + y? — =N 


This is the case in determining the reliability coefficient of a test by the 
split-half method. 


(6.36) 


ATING THE RELIABILITY OF ТЕвтв 

The reliability of measurements is a fundamental tenet in all observa- 
tional and experimental sciences. The problem of the reliability of 
instruments of measurement has, however, received the greatest consid- 
eration in psychology, education, and sociology. 

The traditional method of determining the reliability of a test is 
through the use of the product-moment correlation coefficient. The term 
“reliability of a test” as introduced by Spearman in 1910 was defined 
as the (correlation) “coefficient between one half and the other half of 
several measurements of the same thing." (Ref. 19.) ‘ 


Евтім 
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Until recently, the only methods available for measuring the so-called 
"reliability of a test” were (1) the test-retest. method—doing the same 
test twice; (2) obtaining the correlation between the scores on equivalent 
forms of the test; (3) the split-test method—consisting in obtaining the 
correlation coefficient between the scores on the odd and even items of 
the test. This correlation gives an estimate of the reliability of each 
half of the test. To obtain the reliability of the whole test, application 
is made of the Spearman-Brown formula. 

Recently, other approaches to the problem of obtaining reliability 
have been made. A number of methods, both the traditional and the 
more recent, will be discussed and illustrated in the following pages. 

Problem VL9. Comparison of the split-test and the maximum likeli- 
hood methods. We shall compare the results from determining the 
reliability of a test by the split-test method, using the product-moment 

TABLE 33 
THE Scores or a RANDOM SAMPLE or 25 
STUDENTS on А Biotocy Test 
к= СІ = Ва: 


Indi- Odd, Even, 
vidual X Е 


1 227 226 
2 124 111 
3 210 237 
4 178 161 
5 192 188 
6 104 93 
7 191 201 
8 148 168 
9 125 123 
10 141 157 
11 171 178 
12 168 182 
13 129 118 
14 192 222 
15 176 171 
16 172 180 
17 215 224 
18 102 144 
19 177 176 
20 109 125 
21 146 150 
22 180 184 
23 179 193 
24 141 131 
25 141 135 
Total 4038 4178 
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correlation coefficient, with the results obtained from applying the 
maximum likelihood method. The comparison was made on a test in 
biology from which the scores of a random sample of 25 students are 
listed in Table 33. 

Before applying the split-test method, it was necessary to test the 
underlying assumptions, namely, that the means on the two halves of the 
test are equal and that the standard deviations are equal. The t-test 
for the former (to = 1.917) and the F-test for the latter (Fo = 1.238) 
give probability values P 7 .05. Therefore we may consider the assump- 
tions satisfied and proceed to determine the correlation between the two 
halves of the test by calculating the product-moment correlation coeffi- 
cient and by getting the maximum likelihood estimate. 

The product-moment correlation coefficient between the scores on the 
two halves of the test is given by 

NzXY - (DEY 
VINER GXIN 2F — (УУ) 
29,812.44 = 0.9262 (or 0.93) 


2 
78,930.24) (85,816.64) 


m likelihood estimate is given by 


The maximu 


а ا‎ — GX + ХР)? 
SX? + re- ‘gy 
2(704,643) — 1,8590,053.12.— = 0.9093 (or 0.91 
( Т — 1,350,053.12 (eram 


= 681,148 + 734,04 

Although the difference between the two estimates in this problem 
can not be said to be large, We accept the maximum likelihood estimate 
as th i imate. 

а pac of the product-moment correlation 
coefficient and the maximum likelihood estimate for determining the 
reliability of a test by means of the equivalent forms method. 'The com- 
parison of these two methods of estimating test reliability was made on 
the scores of two forms of à reading test made by a random sample of 


i i 34. 
30 stud rhe data are given 10 Table ` 
pem that the value given for the product-moment correlation 


d likelihood estimate was 
со M 4 or 0.92 and the maximum ; 
^ en же Bm the difference in this problem is small, we 


ikeli i the optimum. 
aed i likelihood estimate as imu 
و‎ that previous to the application of the method 


of estimation, the fundamental assumptions underlying the equivalent 
forms method of testing reliability have been € The s. n 
here is that the standard deviations on the scores of the two forms of the 
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TABLE 34 
Tue Scores on Two Forms or A READING TEST OF A 
RANDOM SAMPLE оғ 30 STUDENTS 


Individual Form B, Form A, 
X f 


1 46 39 
2 47 45 
3 46 42 
4 27 35 
5 59 53 
6 74 64 
7 30 27 
8 50 41 
9 56 50 
10 41 43 
11 24 25 
12 27 32 
13 37 34 
14 59 54 
15 36 38 
16 42 42 
17 41 45 
18 49 39 
19 29 28 
20 57 50 
21 27 ` 26 
22 49 46 
23 34 26 
24 14 23 
25 44 50 
26 48 46 
27 61 64 
28 70 69 
29 58 49 
30 50 60 
N = 30 Total 1332 1285 

ae! Экон 


test are equal. The F-test (Fo = 1.32) showed that this assumption was 
satisfied. 

A more stringent test of the equivalence of the two forms of the test 
can be made by applying three sample criteria proposed by Wilks (Ref. 
34) for testing the equality of means, equality of variances, and equality 
of covariances. Тһе Ly. criterion (two variables) is 


811822 — 81, 


Law = = = = 
Bn + sa) + +X — Хр — [52 — (X; — X93? 


(6.37) 
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where X; and Х are the means; sıı and воз are the variances; and sis is 
the covariance, between the two forms. 

Although tests of significance may be made by the use of the prepared 
tables, an exact level of significance is given by 


P = Luga 


From the scores of the thirty individuals given in Table 34, we make 
the following calculations: 
X = 44.4 
X, = 42.83 
зп = g[04,938 — (44.4)?] = 193.24 
S22 = 159,429 — (42.8333)?] = 146.2751 
вз = pol (61,676) — (44.4000) (42.8333)] = 154.0682 
iQ + X 
3(87.23) 
= 43.62 


Substituting the values required in (6.37), we get 


Linve = -8268 
Р = LQÍ0-?- (.8268) = .07 


БП 
І 


ала 

Therefore, we conclude that the two forms of the test are parallel or 
equivalent. 

Problem VI.11. Determining the sensitivity of a test. Jackson 
(Ref. 18) applied analysis of variance methods and the methods of testing 
statistical hypothesis to the problem of determining the reliability of a 
test.) He treated four different problems: the determinations of (1) the 
existence of a significant practice effect, (2) whether or not the test 
measures the capacities of the individuals tested, and the estimation of 
(3) practice effect, if it is found to exist; and (4) the relative importance 
of the random errors of measurement with respect to the true measure- 
ment of the capacity of the individual. Jackson introduced a new 
hich he called the sensitivity of the test, defined as the ratio 


statistic, y, W 
f true scores to the standard deviation of the 


of the standard deviation o 
distribution of errors of measurement. 

The method of Jackson is applied to the scores of a random sample of 
30 students on two forms A and B of a reading test, the same data as 
were used in Problem VI.10. The original data and calculations are 
given in Table 35. 

It is assumed that each individual's score on the test is the sum of a 
number of independent components and that the analysis gives a measure 


of the influence of each. One component is the difference in ability 


ow through this method after he has 


3 The student may find it advantageous to foll c 
г For the method of testing statistical 


studied the analysis of variance (see page 226). 
hypothesis see page 63. 
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between the individuals tested. Noting the scores of the individual 
students in columns (2) and (3), it is observed that the students on the 
average make higher scores on form B than on form A. Form A was 
given first, so that this difference is called a measure of the “practice” 
effect. Even when allowance is made for the influence of practice effect, 


TABLE 35 
Scores or FRESHMAN STUDENTS IN THE COLLEGE OF AGRICULTURE ON Forms A AND В 
or А READING Test 


Score on : 
Student Sum of scores Difference 
No. Form B, Form A, XY bou ш 
» Y X -Y 
a) (2) (3) (4) (5) 
1 46 39 85 7 
2 47 45 92 2 
8 46 42 88 4 
4 27 35 62 -8 
5 59 53 112 6 
6 74 64 138 10 
7 30 27 57 3 
8 50 41 91 9 
9 56 50 106 6 
10 41 43 84 -2 
11 24 25 49 =f 
12 27 32 59 = 

13 37 34 71 3 
14 59 54 113 5 
15 36 38 74 -2 
16 42 42 84 0 
17 41 45 86 -4 
18 49 89 88 10 
19 29 28 57 1 
20 57 50 107 7 
21 27 26 53 1 
22 49 46 95 3 
23 34 26 60 8 
24 14 23 37 =f 
25 44 50 94 -6 
26 48 46 94 2 
27 61 64 125 m 
28 70 69 139 1 
29 58 49 107 9 
30 50 60 110 —10 
Sum 1332 1285 2617 47 
Sum of squares 64,938 59,429 247,719 1015 
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the scores on the two forms differ considerably. It is assumed that these 
residual differences are attributable to the errors of measurement of th 
test used. Possibly other factors exist, such as possible бейе 
in the ability of the individual students and differences between the two 
forms. Since these factors are not isolated, they are included—if the 
exist—in the measurement of error. The method used to measure ils 
effect of each of the components is that of the analysis of variance, which 
consists in breaking up the sum of squares of the deviations about the 
grand mean into parts assigned to the respective factors. In this way 
the importance of the influence of the respective components can bó 
established and conclusions can be made with respect to the value of the 
test as a measuring instrument. 

The calculations involved in the analysis of variance are as follows: 

(1) Caleulate for each student the sum of the scores and the difference 
between his scores on the forms as indicated in columns (4) and (5), 
Table 35. 

(2) Caleulate the sum and sum of squares of the numerical values 
in each of the columns (2), (3); (4), and (5), and record these in the two 
bottom rows of the table. Note the following checks on the calculations: 


(a) 1332 + 1285 = 2617 

(b) 1332 — 1285 = 47 

(c) 247,719 + 1015 = 2(64,938 + 59,429) 

(3) Calculate the sum of squares for each component as follows: 


(a) For error: : (105 - «| — 470.683 


(b) For between individuals: 5 [217,719 - ке] = 9714.683 


(с) For practice effect: E Е = 36.817 


(2617)? 
(4) For total: 64,938 + 59,429 — =o ^ 10,222.183 
These values are then recorded in an analysis of variance table (Table 
36). 


TABLE 36 
ANALYSIS OF VARIANCE OF Scores or FRESHMEN ом Two Forms or A READING Test 
T Degrees Sum of M 
Source of variation of freedom squares Mean square 
Practice effect 1 36.817 36.817 
Between individuals 29 9,714.683 334.989 
Error 29 470.683 16.230 
59 10,222.183 


Total 


a ف‎ 
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The following applications can now be made of the results shown in 
Table 36: 2. 

An estimate of the standard error of measurement of an individual 
score, Sz, is obtained by taking the square root of the error mean square. 
We get 

Sz = V 16.230 = 4.03 score units 


This gives a direct estimate of the absolute accuracy of the measurements. 

The next problem is to test whether there is a significant practice 
effect, that is, if it is significantly different from zero. This hypothesis 
is tested by calculating first the ratio of mean square due to practice 
effect to the mean square due to error: 


_ 36.817 
= 16.230 


We then refer to Snedecor's table (Table IV, Appendix) of F with degrees 
of freedom nı = 1 and n; = 29. We find that the 5 per cent point of 
F is 4.18. Since the observed value of F, 2.27, is less than 4.18, we 
conclude that there is no significant practice effect. 

The next step is to find out whether the test measures sufficiently 
accurately to distinguish among the individual students. This is deter- 
mined by taking the ratio of the mean square between individuals to the 
error mean square. "Thus: 

. 884.989 
.. 16.230 


= 2.27 


= 20.64 


Referring to Snedecor’s table with m = n» = 29, we find that for nı = 30 
and n; = 29, F os = 1.85 and Р.о = 2.41. Therefore we conclude, since 
20.64 is greater than 2.41, that the two mean squares differ significantly 
and hence that the test measures with sufficient accuracy to distinguish 
between the individuals tested. 

The next problem is to determine the relative accuracy of measure- 
ment, that is, the relation between the magnitude of the errors of measure- 
ment and the size of the differences among individuals. This is given by 
Jackson’s measure, 7, called the sensitirity of the test: 


gc 
y= 
с 


where c, is the standard deviation of the distribution of ability in the 
population sample, and c is the standard deviation of the distribution of 
errors of measurement. 


The unique estimate of y is obtained as follows: 


(a) Subtract the error mean Square from the mean square between 
individuals. 
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(b) Divide the difference by twice the error mean square. 
(c) Take the square root of the quotient as an estimate of y. 
From the values in Table 36, we get 


(a) 334.989 — 16.230 = 318.759 
318.759 

(b) 2(16.23) | реа 

(c) Estimated y = v9.82 = 3.13 

The confidence interval is set up as follows: 

(a) Caleulate the ratio of the mean square between individuals to the 


error mean square, denoted by P. 
(b) Obtain the F.os and Fo: points of the distribution of F from 


Snedecor's table. 
(c) The lower limit of the interval, using F.oı for example, denoted by 


y, is given by 


F 1 
Y= aor, 2 (6.38) 


(d) The upper limit of the interval, 7, using P.» for example, is 


obtained from 
FF о — 1 
3 = کو الہ‎ (6.39) 


(e) We may make the statement that 
SYS 


and the probability that the statement is correct is .98. 


For our problem, we get 


334.989 _ 2064 


(a) F = 16230 
(b) Fo = 2.42 
90.4 1 
a [20.64 _ = = 194 
erg 4.84 


(d 7 = 
(е) 1.94 < y < 495 


Jackson gives the following rel 
ability coefficient in the population: 


ation between the sensitivity and reli- 


= р 
= 427 (6.40) 


where р denotes the population reliability coefficient. 
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From Jackson’s Table XT (Ref. 18), the values of p ш И 
Y — 1.94 and y = 4.95 are approximately .80 and .96, respectively. 
The true values of ү and of p are, of course, unknown. 

Problem VL12. Determination of the reliability coefficient by mean 
of the analysis of variance. Hoyt (Ref. 17) developed a formula “a 
estimating the reliability of a test also by means of the method of amelyen 
of variance. The data used in the calculation are the number of correc 


TABLE 37 — 
TABULATION or DATA NECESSARY ror DETERMINING RELIABILITY py Ноут ME 
— M — Gi r Не 
Individual zen Score 
1 аата k 
1 P. cis eles es Xu УХ, 
NUN РРР: УХ, 
n Bn. ces Rê х. | Ух, 
8 
Total ED LIMES >> x 
i i De 
R 


tS #э%+ фы ә .,, › 2; k denotes the number of а АН 
is the number of individuals ; Хы denotes the score of the АҺ individu 
on the sth item, which is presumably 1 or 0. 


Let us define: 
هد‎ 


Grand mean, X= 


where N = kn. 
>< 
Mean of columns, Ж Es 77 
xs 
Mean of rows, XxX, == 


Cua» ҮП ESTIMATION OF POPULATION PARAMETERS 135 


The sum of squares between items is 


grata XQx9 QI 
- A (Xs, X ez =n (6.41) 
The sum of squares between individuals is 

УУ _ 

4 2 Qi X= -- = (6.42) 


k N 


Since X,; = 1 or 0, 
Xa = Xa 


and the total sum of squares is 


pM (F-2 am 
4L Ba- та NC = (043) 


where ni — Fi ) Ха, or the number of correct responses of all individuals 


8 1 
on all the items, and ne is the number of incorrect responses. 
We shall apply this method to an examination in college mathematics 
consisting of 80 items (k = 80) for a class of 119 students (n = 119). 
The calculations—only summary values—are 
(1) The sum of squares between individuals: 


2657 GI | 


338,042 _ (6216)? 


—- — 9520 166.8426 


І 


(2) The sum of squares between items: 


528.634 _ (6216)? _ 
528.651 — gg) = 1 


(3) The total sum of squares: 


mn; _ (6216)(9520 — 6216) | 
ү = 9520 2157.3176 
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These values are then recorded in Table 38, Analysis of Variance. 


TABLE 38 
ANALYSIS OF VARIANCE OF THE SCORES ON А Test IN COLLEGE MATHEMATICS 
Source of DF Sum of Mean Р Hypothesis 
variation 0 squares squares і tested 
Doreen 118 | 100.8426 © j 
individuals f ^*^^ E 1.4139 (a) 8.20 Reject 
Betw 
a | 79 | 383.6201 | 4.8560(b) | 28.179 | Reject 
Residual. . 9322 1606.8549 0.1124 (c) 
Total 9519 2157.3176 
_ (a) _ 1.4139 
FQ = (6) = oma ~820~P < 01 
(b) 4.8560 


FQ = (су = ora = 817 УР < 01 

The following uses can be made of the results in the table: 

(1) To test the hypothesis that there is no difference between the 
means of individuals. We calculate the ratio of the mean square due to 
1.4139 
01794 = 8.20. We 
then refer to Snedecor's table of F (Table IV, Appendix) with degrees 
of freedom mi = 118 and n; = 9322. We find that the 1 per cent point 
of F for n; = 100 and ns = « is 1.36. We could interpolate to get the 
value for nı = 118 and n, = 9322, but this operation is unnecessary, since 
it is obvious that the obtained value of Ё will be much greater than the 
table value. Therefore, we conclude that the test measures sufficiently 
accurately to differentiate among individuals. 

(2) To estimate the precision with which the test measures, we may 
compute the reliability coefficient, 7, as follows: 


a—c _ 1.4139 — 0.1724 _ 
а ~ 1.4139 


individuals to the mean square of residual: F = 


0.88 (6.44) 


а= 


A measure of the absolute accuracy of the test is given by the standard 
error of measurement of an individual score, sz, where 


= residual s.s. 
D.F. between individuals 


1606.8549 __ s 
=. ae = 3.68 score units 


Problem VI.13. Determination of the reliability of the test by the 
method of rational equivalence. Kuder and Richardson developed а 


Sz 


. 
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method of determining the reliability of a te М 

method of rational equivalence (Ref. 20). Tu. boone ا‎ spe cm 
eS arises from the conception of a given test as being s al v n 
hypothetical parallel form where every item on the one fo жүзе m. 
changeable with the corresponding item on the other and tht ees 
each pair of items is equivalent with respect to content and poen 


Furthermore, it is assumed that all corresponding correlations among the 


items are equal. 
A number of formulas representin i 
g varying degrees of rigor 
presented. 'Only the one represented for general use is gi (Re, 
20, Formula (20)]: ee 
ñ o= трд 
—Ó — (6.45) 


= 3 
n—1 о? 


Tu = 


hi Я Ее em : 
ere 7, is the reliability coefficient; n, the number of items; cj, the 


variance of the test items; and pq, the mean variance of the items. 
Jackson and Ferguson (Ref. 19) point out that the derivation of 
Formula (6.45) can be made on the basis of the equivalence assumption 


only. We present their derivation: 
The variance of а test of n items, 


and interitem covariances, is 
% p 2+2 Ў 78:8; 
ы . E 
(i «2 (6.452) 


nsi + n(n — l)rijsis; 


as a function of the item variances 


M 


І 


where s? = variance of the test 

variance of item 2 

variance of the item j 

— correlation between items ? and j 
Si = average item variance 

— average item covariance 


— number of items 
Assuming the existence of a hypothetical parallel form of the test, also 


of n items, the variance of the sum of these two tests is 


s} = 218 + 2n(2n — 1)ғ485; (6.46) 


where sj — variance of the sum of scores on the two equivalent forms 
of the test. 


It is known from the correlation of sums that 


2 = 924 n) (6.47) 


where т, = correlation between the test and its hypothetical equivalent, 


or the reliability coefficient. 
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When the values of s? in (6.452) and s} in (6.46) are substituted in 
(6.47) and the equation is solved for ту, we have 
„8 — 582 
%— 1 5; 


ти = (6.48) 
which formula is identical with (6.45). 

It is to be noted that the assumption made in this derivation, that 
TiS; = TeySsép = TvjSv8; (that is, that the covariances are on the average 
equal), is somewhat less rigorous than in the equivalence assumption. 
In the latter it is specified that r; = Тир = Тг), and s; = sy, where the 
primes (’) refer to the hypothetical equivalent form. 

As an illustration of this method, we present the results of the admin- 
istration of an Industrial Relations Classification Test of 100 test items 
to a college class of 61 students. An analysis of the scores on the tests 
gave the following values: 

Test variance, оў, = 169.5067 
Average variance of the test items, pg, = .148299 


Reliability coefficient, Tu, = 2 ih, Facing 


— 1 с} 
- 100 169.5067 — 100(.148299) 
99 169 
= .92 


Formula (6.45) is not in an efficient form for calculation. Hoyt 
(Ref. 16) suggests the following variant: 


n 88 + 8; — T(T +k) 
—1 kSs — T? (6:40) 
where Т = sum of scores of all individuals 
Ss = sum of squares of each of the scores for all individuals 
S; = sum of squares of each of the total correct responses for all 
items 
k — number of individuals taking the test 
^ — number of items in the test 


Tu = 


Applying (6.49) to the data from the above test, we get: 


ty = 100. 62(52,734) + 42,929 — 1618(1618 + 62) 
“99 62(52,734) — (1618)? 
.92 


І 


DEGREES or FREEDOM 


We have used the concept “degrees of freedom” a number of times 
without defining it. Since it is such a fundamental concept in statistics, 
we shall try to add to an understanding of it by referring for its interpre- 
tation to three analogous settings—physics, geometry, and statistics. 
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Physical Interpretation. A rigid body which can move about in 
space without changing the direction of any line in it is said to have a 
motion of translation. It can also turn about any point, say P, without 
the position of P changing—a motion known as a motion of rotation 
about P. It can again have a motion compounded of a motion of trans- 
lation and one of rotation. 

Take any convenient frame of reference, O(X1, Xs, Хз) fixed in a 
rigid body. The position of the rigid body at any instant is defined 
uniquely by the position of O(X1, Xs X;). We can specify the position 
of the body axes by six parameters, for example, the Cartesian coordinates 
a of O, with respect to fixed axes, and the three angular or polar coordi- 
nates ф of O. Therefore, the rigid body is said to have 6 degrees of 
freedom. The 6 degrees of freedom correspond to the positional coordi- 
nates just specified. Of course, other equivalent sets of coordinates 
may be taken. However, if a definite relation or relations are fixed or 
assigned between the six parameters or positional coordinates, then the 
rigid body is said to be subject to geometric or kinematic constraint and 
has less than 6 degrees of freedom. Each restriction reduces the number 
of degrees of freedom by 1. The fixture of one point of the body would 
constitute a constraint and reduce the degrees of freedom of the body 
by 1. Also, a point might be restricted to lie on a curved guide which in 
turn is constrained to move in a prescribed way. Sliding or rolling con- 
tact imposed between the body and either stable or movable guides 
of constraint. The constraints may be 


represents a more general kind 
ns connecting the positional coordinates 


represented by functional relatio: 
or parameters (Ref. 15). 


Geometric Interpretation. The geometric interpretation of degrees 


of freedom grows out of a consideration of the conceptions derived from 
the geometry of n-dimensional space. The geometrical or vectorial 
representation of a sample as à vector! with n orthogonal or mutually 
perpendicular components was introduced into statistics by Fisher (Ref. 
8). He carried out the first systematic investigations of the problems 
underlying the exact sampling distribution of a number of statistics and 
thus laid the basis for the solution of many theoretical problems of 
statistical distributions. 

It is well known that a one-to-one correspondence may be set up 
between all real numbers, т, and all points on a straight line. А similar 
seen all pairs of real numbers (x1, 7) and all 


correspondence exists betw i 
points in a plane; also between all triplets of real numbers (a1, 22, 23) 


and all points in a space of three dimensions. We may, then, generalize 
by considering any system 0 
senting a point or vector z in the nth- 


f n real.numbers (21, 3s, - - . 5 т,) as repre- 
dimensional (Euclidean) sample space, 


4 А vector is a quantity which has magnitude and direction. It is a matrix con- 


sisting of one single row or column. 
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V,. A point in a line has freedom of movement in one dimension; that 
is, it has 1 degree of freedom. A plane has two dimensions and a point 
on a plane has 2 degrees of freedom. Likewise, in ordinary space of three 
dimensions, a point in this space has 3 degrees of freedom. Generalizing, 
à point in n-dimensional space may be said to have n degrees of freedom. 

The numbers or values of the respective elements of a sample, 21, %2, 
+ + +) Tn, are, then, the coordinates of the sample point P in multiple 
dimensional space. The dimensionality of the sample point P is the 
number of observations, n, in the sample. There are n degrees of freedom. 
However, if a restriction be placed on the sample point, the number of 
degrees of freedom is decreased by 1; that is, its dimensionality is reduced 
by 1 and thus becomes n — 1. Correspondingly, each additional restric- 
tion or section through sample space carries with it an additional reduc- 
tion in the dimensionality or number of coordinates. Thus to restrict the 
point in three-dimension Space to a surface, one condition is imposed on 
its coordinates. To restrict a point in space of three dimensions to à 
curve, it is necessary to subject its coordinates to two independent con- 
ditions (Ref. 31). 

An illustration of the reduction of dimensionality is given by consider- 
ing two planes whose equations are 


(1) 2-у4-32-4-0 
(2) 2z —y+5e+3=0 


In (1) only two of the values are independent ; given z — 5 and y — 12, 
the value of z is fixed as 2. Likewise in (2), given any two values for z 
and y, z is determined. In each case there are 2 degrees of freedom. If 
restrictions are imposed such that points which lie on both planes are 
to be determined, then they must lie on the line of intersection of the two 
planes. These points are determined by solving the equations for z and z 


in terms of V, or for y and z in terms of t. Thus: u = 2 1 29, 


"d 
2-4 


» any desired number of points on the line are obtainable. 
Since there is only one independent variable, the number of degrees of 
freedom is 1—the point can move up and down the line of intersection. 
The dimensionality has thus been reduced to 1. 

Statistical Interpretation. In its statistical application, the number 
of degrees of freedom is the number of free variables in the problem or in 
the distribution of the random variables connected with it. For each 
restriction imposed upon the original, observations, such as in the estima- 
tion of a population value from the sample, the number of degrees of 
freedom is reduced by 1. 


It has been previously noted that the unbiased estimate of the popula- 
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tion variance from a sample, n, is obtained by dividing the sum of squares 
of deviations of the individual observations from their mean by n — 1 
which is the number of degrees of freedom. In this case, it is cleaved 
that this is the number of deviations reduced by the number of parameters 
estimated from the sample and used to establish the point from which the 
deviations are measured. In this case, the mean is found from the 
sample, and hence the number of degrees of freedom is one less than the 
number of observations. 

In the case of establishing a regression line among a distribution of 
observed values, the straight line will fit any two observations with no 
residuals. Thus, in fitting the least-square line to 25 observations there 
are 23 degrees of freedom. Two degrees of freedom have been used up in 
estimating the two parameters in the regression equation (see page 88). 

The principle that for each relationship imposed upon the original 
observations there is a corresponding reduction in the number of degrees 
of freedom originally available will be found to apply throughout statis- 


tical procedures. 
PROBLEMS 


kelihood estimate of the population reli- 


1. Show that the maximum li 
plit-test method is 


ability coefficient, p, for the case of the 8) 
(3X; + ХҮ)? 
— КЗ T 
2zX;Yi 2N 
چ ے‎ y 
a — (ZX,4 ХУ)? 
zX)-4ZYi- — 2N 
when X; and Y; denote the scores obtained by the ith individual on 
the odd and even items of the test, respectively; N, the number of 
pairs of values; and p, the correlation coefficient in the sampled popu- 
lation of X and Y. 
2. Set up the confidence inter 
cent for p, the population cor 
is ^ = .77 the correlation between scor 1 
and the Otis Intelligence Test for a random sampling of 50 graduate 
students. Any of the following may be used: The exact tables of the 
7-distribution (David F. N., Tables of the Correlation Coeficient, 
Biometrika Office London, 1938); the transformation of r suggested 
by Pillai (Pillai, K. (3; B., Sankhya; Vol. 7, Part 4, pp. 415-422, July, 


3 і і formation due to R. A. Fisher. 
C Seti logarithm Т vy, where a is determined from the 


val with а confidence coefficient of 95 per 
elation coefficient. The sample value 
es on Miller’s Analogies Test 


The probability for ; 
normal P da a responding to a given confidence coefficient œ (say 
0.95), is 

ia 
gonal hnear functions, see the 


5 For equivalence of degrees of freedom to ortho 


discussion of Analysis of Variances Chapter X- 
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"pm. ü n) - A Р 
то BA tog | {7| <a 
3. Given: Үк = .6570X + 33.76 


as the equation for estimating the score on a mid-quarter exami- 
nation from a knowledge of the score on Miller’s Analogies Test. 


Sum of squared z-deviations = 10,584.88 
Sum of squared y-deviations = 9788.50 
n = 50 
Set up the confidence interval for byz With a confidence coefficient of 
99 per cent. Let 8yx be the population value. 


Variable, 2 = ( — 8) v z(x — Ху — М RC Xy 
$ 8 
has Student’s distribution with n — 2 degrees of freedom. 


1 
2 = P 2 
# == ———„ Ya) 
4. With the aid of Nair’s tables (Ref. 22) find the 95 per cent and 99 per 
cent confidence intervals from the following values of the median: 
(a) Median = 38, N = 25 (c) Median = 42, М = 229 
(b) Median = 18, N = 25 (d) Median = 21, N = 219 
5. Set up the 99 per cent confidence interval for the difference between 
the percentages given below obtained in two public-opinion polls: 
nı = 3000, р = .52 п. = 800, р, = 48 


Set up the 98 per cent confidence interval for the difference in per- 
centages obtained on the same sample: 


"n 


68 per cent answered “уез” n = 500 
32 per cent answered “no” 


7. Plan in advance the size of sample necessary to provide from the 
sample an estimate of P so that the confidence belt will be of breadth 
about .05. Take a confidence coefficient of .95. The value of P 
from the sample is .60. [See also: Finney, D. J., “Errors of Esti- 
mation in Inverse Sampling,” Nature, Vol. 160 (1947), pp. 195-6.] 

8. Set up the fiducial limits of the true 
data from the controlled experiment 
Use a fiducial probability of 95. 


9. Set up the fiducial limits of the variance of the distribution of differ- 


ences based on the data in Problem 2, page 98. Use a fiducial 
probability of 90. 


mean difference based on the 
given in Problem 2, page 98. 
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10. 


11. 


12. 


On a particular intelligence test a pupil received an I.Q. rating of 98. 
On this test the standard error of an individual score is 4.51 I.Q. 
points. Set up the confidence interval for the true score of the 
pupil, using a confidence coefficient of 95 per cent. 

Given: Y: = .6570Х + 33.76 

which is the equation for predicting Yz, the score on a mid-quarter 
examination from a knowledge of a score, X, on Miller's Analogies 
Test. 

т = .683, 52 = 216.018, s? = 199.765, n = 50, X = 69.32 
Determine the confidence interval (99 per cent) for mid-quarter 
score for the following scores on Miller Analogies: 

(a) 99; (b) 69; (c) 27. 

(d) Explain your answer to (a) above. 

The following table (based on the 1940 Census) gives the percentage 
of adults over twenty-five years of age by states who had not com- 
pleted more than four years of school: 


nnn 


Percent- Percent- Percent- 

State age | State age | State age 
ПО олар ET RE 4.1 Dist. of Columbia 8.2 | Rhode Island.... 13.7 
Oregon., 5.2 | Ohio .. 8.4 | Maryland... 15.3 
Idaho..... 5:2 Nevada.. 8.8 | West Virgini + 16.5 
Un, os cann > 5.5 Colorado. 9.0 |Florida..... EE Я 
Washington . 5.9 Wisconsin . 9.4 |Texas.... 5 18.8 
Nebraska... 6.0 TAIR. «s. NES 9.6 |Arizona.. 19.4 
dre 6.1 Massachusetts... 10.1 |Kentucky.. 20.2 
Vermont... . 6.1 Michigan........ 10.2 | Tennessee.. 21.7 
Wyoming........ 7.1 Missouri.......- 10.3 |Arkansas.. . 28.1 
South Dakota.... 7.2 North Dakota 10.8 | Virginia......... 23.2 
Montana......... 7.4 Connecticut. . 11.2 |North Carolina.. 26.2 
Maine.. 7.4 New Jersey . 12.0 |New Мехісо..... 27.3 
Minnesota 7.5 | New York 12.1 | Alabama..... 28.9 
Indiana. 7.7 |Pennsylvania.... 12.3 i 30.1 
California 8.1 Delaware... .. 12.9 30.2 
New Hampshire.. 8.1 Oklahoma....... 13.5 |South Carolina... 34.7 
Louisiana........ 35.7 


13. 


Problem: Set up the tolerance limits for years of schooling of adults 
(take а = 90 per cent). How шау ihe results be used in analyzing 
a state's educational program? 

Students of fiscal policies are invited to study the characteristics and 
use of grant-in-aid apportionment formulas in relation to setting up 
tolerance limits. (Cornell, Francis G., ‘‘Grant-in-aid Apportion- 
ment Formulas," Journal of American Statistical Association, Vol. 42 


(1947), pp. 92-104.) 
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14. The following tabular data are to be used for the problems below: 


Scores оғ 25 FRESHMAN STUDENTS on Test Forms 
A AND B or А SCIENCE READING TEST 
ыле es 


Student Score ûn Score on 


No. Form A Form B 
1 18 21 
2 33 37 
3 38 44 
4 29 30 
5 64 63 
6 74 68 
7 33 36 
8 72 66 
9 58 51 

10 56 57 

11 28 89 
12 71 76 
13 53 53 
14 39 40 
15 37 42 
16 29 27 
17 58 68 
18 20 26 
19 65 71 

20 28 31 

21 16 23 

22 50 44 

23 29 32 

24 46 54 
25 36 35 


Problems: 


(a) Test the equivalence of the forms A and B of the reading test by 
(1) Testing the equality of the standard deviations of the scores 
on the two forms. 
(2) Testing the equality of means, variances, and covariances of 
the scores on the two forms. 
(b) Determine the reliability of the reading test by calculating the | 
product-moment correlation coefficient, " 


(c) Determine the reliability of the reading test by getting the maxi- 
mum likelihood estimate. 


(d)* Determine the sensitivity of the reading test. 


(e) Caleulate the standard error of measurement of an individual 
Score. 


* This may be postponed until the analysis of variance method has been studied. 
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15. The following data are to be used in the problems below: 


Scores or А RANDOM SAMPLE or 25 STUDENTS ON 
A COMPREHENSIVE EXAMINATION IN 
COLLEGE BIOLOGY 
Student Score on items 

No. Odd Even 


145 143 1 
179 175 2 
157 158 3 
172 178 4 
94 113 5 
140 143 6 
139 136 7 
243 234 8 
207 201 9 
213 203 10 
248 222 11 
184 200 12 
191 195 13 
136 126 14 
208 186 15 
186 163 16 
158 160 17 
197 188 18 
206 196 19 
249 253 20 
196 206 21 
154 167 22 
142 148 23 
186 188 24 
221 204 25 
س 


Problems: | 

(a) Before attempting the 
the biology test, test 
standard deviations on 

(b) If the assumptions in 
of each half of the test 
lation coefficient. 
(1) What are the assum. 

man-Brown formula? 


(2) If the assumptions in 
coefficient of the whole test. 


the two halves of the test. 


(e 
(d 


= 


coefficient of the test by ge 
Calculate the standard erro 


score. 


= 


145 


methods of determining the reliability of 
the assumption regarding the means and 


(a) are fulfilled, determine the reliability 
py calculating the product-moment corre- 


ptions underlying the use of the Spear- 
(1) are fulfilled, estimate the reliability 
(a) are fulfilled, calculate the reliability , 


If th mptions in (a) d sue 
bue rl tting the maximum likelihood estimate. 
r of measurement of an individual 
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16. Calculate the reliability coefficient for the English examination by 
using the method of rational equivalence. The examination of 297 
items was administered to a group of 209 college students. The 
following values were computed from the examination results: 


X = 144.58 s? = 775.0656 
s, = 2784 pq = .245 
n = 297 


17. Calculate the reliability coefficient for a mathematics test of 75 items 
administered to 35 students by using the analysis of variance method 
(Hoyt). The basic data are given in Tables A and B. 


TABLE A 
NUMBER or CORRECT Responses то EACH оғ THE 75 Тевт IrEMS 


Item f| Item f | Item f| Item f| Item f | Item f| Item f| Item f 


22 1 11| 21 18| 31 7| 41 24] 51 9 61 16| 71 20 
25 12 25| 22 22| 32 9| 42 20| 52 8 62 16 | 72 16 
25 13 11 23 23 | 33 24| 43 26| 53 18| 63 1 73 23 
24 14 9| 24 17| 34 19| 44 9| 54 14 64 8 74 14 

8 15 17| 25 17| 35 26| 45 16| 55 10 65 20| 75 2 


17! 46 15| 56 6] 66 19 
27| 17 13| 27 14| 37 15| 47 4| 57 9| er 11 
11 18 19! 28 14| 38 13] 48 31] 58 11 68 9 
23 19 23 | 29 16| 39 22| 49 24] 59 7 69 14 
16 | 20 25| 30 25| 40 18| 50 22| 60 26 | 70 19 


Ooo-o лысы ы 
S 
m 
o 
to 
ч 
to 
o 
© 
eo 
o 


m 


TABLE B 
TOTAL Scores or THE 35 STUDENTS 


Score f |Score f Score  f|Score f {Score 7 


55 1 45 1] 36 1 30 1 25 2 
54 1| 44 2| 35 3 29 1 24 1 
52 2| 43 1 34 1 28 2| 23 1 
50 1 42 1 33 1 27 1 7 1 
48 1 1. 1 1 1 26 3 16 1 
47 1 1 


18. (а) Look up in some reference text or texts (Kelley, Truman L., 
Fundamentals of Statistics, for instance) the following methods 
of estimating correlation: 

(1) Biserial r 

(2) Point-biserial r 

(3) Biserial phi-coefficient 

(4) Correlation for a fourfold point-surface, or the phi-coefficient 
(5) Tetrachorie r 
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19. 


20. 


14. 


15. 


. Alexander, 


(6) Coefficient of mean square contingency 
(7) Correlation ratio 
(b) Specify the types of problems for which each method in (a) is 


designed. 

(c) What assumptions underlie the use of each method? 
(1) How may these assumptions be tested? 

(d) Which of the approximate measures of relationship are converti- 
ble to the produet-moment scale, and under what conditions? 


Evaluate the several statistics that are in use as indices of internal 


consistency in item analysis. 

Plan in advance, from the data in Problem 9, Chapter 5, page 100, 
the size of sample such that the probability will be .95 that the .99 
confidence interval of the mean will have a length less than four- 


Score units. 
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CHAPTER VII 
NORMAL AND NORMALIZED DISTRIBUTIONS IN STATISTICS 


The assumption that measurements are distributed in normal prob- 
ability curves underlies much of statistical theory. The mathematical 
conditions for normality have been determined (Ref. 8). The best 
evidence of the fulfillment of these conditions in any particular case is 
that which is available in the observations. Sometimes, then, it is sig- 
nificant to show that observations are normally distributed or at least that 
the available evidence indicates a high probability of such a distribution. 

The Test of the Hypothesis of Normality. Standard statistical 
methods are available for testing the hypothesis of normality. The chi- 
square test of the goodness of fit of theoretical normal frequencies to 
observed frequencies is a general test of the normality of a distribution of 
measurements. The test based upon the criteria of Pearson is first pre- 
sented. Two criteria provide the basis of estimating the extent of agree- 
ment between an observed distribution and the normal distribution with 
respect to two characteristics, symmetry and kurtosis. 

The criterion for symmetry is VB = Vv. а/ш. The criterion for 
kurtosis is Bs = а/ы. For the normal curve, үд: = 0, and $: = 3. 
It is observed that these criteria involve a second, third, and fourth 
moment. They are not affected by the size of the unit of measurement 
employed and are measures of the shape of the unimodal frequency 
distribution. The measurement of the form of variation of the distribu- 
tion is given in terms of symmetry and kurtosis, or the flatness of the 


mode. 
Pearson’s Test of Normality. The steps in the process of fitting the 


normal curve to a series of observations by the method of moments are 
described in detail below. 


1. Calculate the first four moment coefficients. 

(a) Moments about the mean and origin of ungrouped data. If X 
is the variate, measured from the origin; X is the arithmetic mean; and 
N is the size of the sample; then the sth moment coefficient, xs, about the 


mean is 


„= 1 5 (x - Ху (7.01) 


In practice, usually with machine calculation, it is convenient to cal- 
culate first the powers of the observed values of X measured from the 
149 
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origin. Then the sth moment coefficient, 47, about the origin is 


Жаа = (7.02) 


Then the first four moment coefficients about the mean can be found 
from those about the origin from the following equations: 


ш = 0 

ш = шщ — (up? 

из = из — Зщш + 2(ш)° 

шщ = щ — 4ши + б(щ{)?и} — 3(ш)* 


These equations may be obtained by expanding the binomial, (X — X), 
and finding the mean for each term of the expansion, separately. 

(b) Moments from grouped data. 

When the original observations are first grouped into a frequency 
distribution, it is assumed that all values in a class interval have the value 
of its central point. Thus if n, is the number of observational values 


in the /th class interval and X, is its central value, then the sth moment 
coefficient, say V;, is given by 


(7.03) 


Pa Y nX: (7.04) 


The moment coefficients V’ for group data should then be reduced to the 
values V, about the mean by means of equations as follows: 


Vi =0 
Y, = V; — (voy 
Va = Vi — ЗУУ; + 2(7 (105) 


Và = Vi — AVIV; + 6(V?V1 — 3(VD* 

2. Caleulate the adjustments for grouping errors. 

The assumption in grouped data is that the observations take the 
value of the mid-point of the class interval. This assumption can be 
more nearly fulfilled if corrections for grouping, known as Sheppard’s 
corrections, are applied. Хо corrections are necessary in the first and 
third moment, since the effects of grouping tend to balance out. They 
are made in the second and fourth moments when the statistics are a 
system of areas and the height of the curve tapers off gradually at both 
tails. These corrections serve then to give a better estimate of the 
parameter values. The sth moment coefficients, u,, with Sheppard’s 
corrections, are 


ш = Vi 
= PEN - : 
^ © s. 3s (A?) (h = length of interval) (7.06) 


ра = Ys = 2 V2(h?) + zio (Л) 
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3. Calculate 81 and Be. 


в. = a [ = a (7.07) 
If normal: vB: = 0; Вз = 3 (7.08) 


4. Test whether the obtained values of 4/8: and 8» differ significantly 
from 0 and 3. 

The exact sampling distributions of 4/8: and £2 when the population 
is normal have not been worked out, but E. S. Pearson (Ref. 18) has 
determined approximate empirical frequency curves from the moments 
of the sampling distributions. Tables giving values of Bı and 8» are 
available by which it can be determined according to size of sample how 
much deviation may be expected from 0 and 3 due to random sampling 
errors alone. 

If either one or both of the criteria, 4/8: and f», differ significantly 
from the values for the normal curve, 0 and 3 respectively, the hypothesis 


TABLE 39 
Tum COMPUTATION оғ THE First Four Moments ror USE ім DETERMINING PEAR- 
son’s CRITERIA ОЕ NORMALITY 


Group _ X — M45 " Р 4 
intar] f a= 10 fe f fe fe 
(1) (2) (3) (4) (5) (6) (7) 

229 5-239. 9 27 243 | 2,187 | 19,683 
219. ex 2 к 8 112 806 | 7,168 | 57,344 
209 5-2195 31 7 217 | 1,519 | 10,633 | 74,431 
199.5-209.5 50 6 300 1,800 10,800 64,800 
189.5-199.5 56 5 280 | 1,400 | 7,000 | 35,000 
179.5-189.5 78 4 312 1,248 4,992 19,968 
169.5-179.5 75 3 225 675 | 2,025 | 6,075 
159.5-169.5 81 2 162 324 648 1,296 
149.5-159.5 81 1 81 БД 8l 81 
139.5-149.5 81 0 0 0 0 0 
129.5-139.5 77 =a = к= (7 77 
119.5-129.5 53 -2 —106 212 | — 424 848 
109:5-119:5 46 -8; -188 414 | —1,242 | 3,726 
99.5-109.5 31 no —124 496 | —1,984 | 7,936 
89.5- 99.5 22 ج‎ —110 550 | —2,750 | 13,750 
79.5- 89.5 19 ad -114 684 | —4,104 | 24,624 
69.5- 79 5 15 =g —105 735 | —5,145 | 36,015 
59.5- 69.5 0 =й 9 9 5 9 
49.5- 59.5 4 -9 - 86 324 | —2,916 | 26,244 
39:5- 49.5 1 —10 = 10 100 | —1,000 | 10,000 
29:5- 39.5 1 ii Sdi 121 | —1,931 | 14,641 
y = = 885 | 11,899 | 24,561 | 416,539 
Total |N = 819 |e С РЕ 


152 NORMAL AND NORMALIZED DISTRIBUTIONS  [Cnuar. VII 


that the sample could be à random sample from a normal population is 
rejected. 

Problem VIL1. Testing the normality of a sample by Pearson's 
method. Тһе fitting of the normal curve to a set of observations is 
carried out on a set of achievement-test scores of 819 students on a final 
examination in a college course in general zoology. The arithmetical 
labor is substantially reduced over that of following directly the process 
specified in Equation (7.04) by taking the origin near the center of the 
distribution and proceeding to work with the class interval as the unit. 
This is done by calculating the moments about the origin of the computa- 
tion variable, 2. The corrections indicated in Equation (7.06) can then 
be made, putting h = 1. The whole process is followed out as recorded 
in Table 39. 

We shall follow through the calculations in the order in which they 
have been presented in the preceding theoretical discussion. The mean 
and standard deviation of the distribution are as follows: 


2 = 8$ = 1.08059 
X = 144.5 + 10.8059 = 155.3059 


— (2fe? o No (11,899 _ »Y _ 2996.0227 _ 
& = (ar z) = V ag — (108059*| = SS" = 3.05814 
Sx = 36.5814 


Step 1. Calculate moments about the origin of the computation 
variable: 


Жж" 
V! = 1.080586 v; = 11,800 — 
(V)? = 1.1676668 
v; = 74,561 _ оо овор 
(V)* = 1.26176386 ¥ = 7810 
(Vi)! = 136344459 V; = 516,49 = 508.5946 
Step 2. Calculate moments about 2: 


Йу = 
Va = Vi — (V)? = 14.52869 — 1.16767 = 13.36102 
Va = Vi — 3VIV1 + 2(У!)з 
= 29.98901 — 3(1.080586) (14.52869) + 2(1.26176386) 
= 29.98901 — 47.098497 + 2.52352772 = — 14.585959 
Va = Vi — 4ViVi + 6(V4)2V5 — 8(У/)+ 
= 508.5946 — 4(1.080586)(29.98901) + 6(1.1676668) (14.52869) 


s — 3(1.36344459) 
= 476.6694 


Step 3. Correct the moments for grouping by Sheppard’ 


s corrections 
(for computation variable т, we have A = 1): 
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ш = V1 = 0 
из = Vo — 4A? = 13.36102 — .08333 = 13.27769 
из = V; = — 14.585959 
иа = Va — Veh? + той" = 476.6694 — 6.68051 + .029167 
= 470.01806 


Step 4. Calculate 8ı and 8» or оз and as: 


uj (—14.585959)? — 


p= 37 o = .09088713 
Bı = а = a = —.3014 


We refer to the tables of 4/8: (Ref. 18) and find that this deviation, 
~.3014, or one greater than this from.1/Ai = 0 or ол = 0 for the normal 
curve, is to be expected less than once in 100 trials by random sampling 
from a normal distribution or population. Thus, the distribution under 
consideration deviates significantly from a normal distribution with 


respect to V/Bı. 
шщ _ 470.01806 _ 
b: = qê = TTT) — 2000 


We refer to the tables of 8 (Ref. 18) and find that the observed value 
of B; or one less than this value is to be expected less than 5 times in 
100 trials but more than 1 time in 100 trials in random sampling from a 
normal population. Thus, the present distribution deviates significantly 
at the 5 per cent level from a normal population with respect to 8». 

Fitting the Normal Curve to a Set of Observations by the Use of 
Cumulants. In 1928, R. A. Fisher developed a new kind of symmetric 
function, the k-statistics, which possess the valuable property of giving 
particularly simple sampling formulas, obtainable directly by combina- 
torial methods, and removing most of the algebraic labor characteristic 
of the older methods. Тһе k-statistics, (p = 1,2, > ° `), are symmetric 
in the observations, Xi, . . = » X» 50 that the mean value of k, is the 
pth cumulant, or Z(k;) = к. 

Fisher's criteria to test for the departure from normality of an 
Observed sample, known as the statistics gi and gs, ate calculated from the 
k-statistics, Ёл, ke, ks, and ks, which are in turn derived from the sums of 
powers, from the second through the fourth, of the deviations from the 
mean. The quantity gi is essentially a measure of asymmetry or skew- 
ness. The parameter y of which giis an estimate 1s related to + VBı 


of Pearson's notation as follows: 


жй =т= کے‎ = (7.09) 
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The quantity gs is a measure of the peakedness or flatness of the curve, 
that is, its kurtosis. The parameter y: of which gə is an estimate is 
related to Pearson's 8» in the following way: 

Ba K4 
f= ht en (7.10) 

A convenient way of calculating the k-statistics is to get first a series 

of values Vi, Vs, Vs, V4, defined as follows: 


УХ 
аг 
Y. zx? _ ўз 
N (7.11) 
УХ: " " қ 
Waco И с 
У; = с — 4XV, — 6X?y, — X: 
The k-statistics are then given by 
kı = Vi 
auct NV: 
k: = N 
при МУ; (7.12) 
° N-=-DN 2) 
2 2 
k N (N + 1) Vi 3N y; 


' (N-02)W-2(N-3'' W-23((-3 
If the sums of powers are calculated from group data, Sheppard's 
corrections for grouping may be applied as follows: 
kı = ka — de; = — ghey 
However, these corrections should be used for purposes of estimation, not 
for testing significance. 
The statistics gı and gs are given by 


ks 

n= та (7.18) 
k 

92 = D (7.14) 


For samples from a normal population, fiis distributed normally 
about 0 with a sampling variance 
Ж = 6N(N — 1) 
^ (N —~ 2)(N + D(N +3) 
Similarly, g2 is distributed normally about 0 with a sampling variance 
f= 24N(N — 1)? 
^ (N — 3)(N – 2)(N + 3N + 5) 


(7.15) 


(7.16) 
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Unless the divergence is marked, large samples are required to detect 
departure from normality, because the exact sampling distributions 
of the criteria are not known. 

Problem VII.2. Testing the normality of a sample by Fisher’s 
method. An example of the method of testing normality by means of 
the g-criteria is given by applying it toa sample of the honor-point ratios 
(H.P.R.) of 302 freshmen in the University of Minnesota College of 
Agriculture. The calculations are set out in Table 40. 

We find that £j, = sE = .166 and i, = sE = —1.97. Entering 
the normal table or the t-table with degrees of freedom = œ, we find 
that the respective probabilities are .87 and < .05. Therefore we may 
conclude that the hypothesis of normality is rejected at the 5 per cent 
level. 

Special Treatment of Data to Secure Normal Distributions. Two 
alternatives are open to the research worker if he finds that his data do 
not conform to a certain model about which considerable is known and 
by the use of which the analysis is relatively easy to work out. He may 
develop a new model +o which his data may conform, or he may transform 
his data to make them fit one of the conventional models. The first 
alternative is often a problem of considerable mathematical difficulty. 
Hence the second procedure is usually followed. x 

In particular, the large part of statistical theory is built on the assump- 
tions that the observations are distributed normally and that the variance 
is constant. It is often important, therefore, for the research worker to 
show that his measurements are distributed normally or to transform 
them into a form that is normally distributed, or at least into a form 
that has the best possible chance of being so distributed. : In some cases 
the normal probability curve gives à very close approximation to the 
Observed facts. Although this is not often the case, it is usually possible 
to transform the original observations into some function of them so 
that the function will be distributed normally. In this way the processes 
in subsequent caleulations become simplified and the results more com- 
prehensive in application. For instance, if the mean and standard 
deviation of the normal distribution are known, the distribution is known 
exactly. If any obtained distribution of observations is established as 
normal, then the known properties of the normal model may be applied 
to it. Tests of significance become more valid and sensitive when the 
lized in case of original skewness. 


sampling distribution is norma. 1 | l 
The linear scale seems to be used in taking observations almost auto- 
le used in nature. This scale 


matically, as if it were the one unique sca à Ae 
may often be the most convenient way of representing the original 
on the only way. Should 


observations, but it need not be for that reas 
e in one way follow the normal law, other methods 


measurements mad 


37.5 
== = 22 
x 302 1241722 N? = 91,204 
X? = 01541873 (N — 1)(N — 2) = 90,300 
Хз = .001914578 (2X)? = 1406.25 
X* = 0002377373 Мү, = МУХ? — (2X)? 
ЗХ = 3725106 — 1,103,432.50 — 1406.25 
AX = .49668878 = 1,102,026.25 
6X = .7450332 NV: = 3649.09354 
6X? = .09251238 Уз = 12.083091 
Mean = .875 + .25(.124172) — N?V; = NEX? — 3(ZX)NV, — (УХ)? 
= .875 + .031043 = 499,772.25000 
= .906 —410,523.01875 
— 174.61411 
N?V; = 89,074.61714 
Үз = 976652 


МУ, = NzX* — AXN:V, — 620ү, — Xs(zX): 
= 32,617,340.1250 — 44,242.3582 
—101951.0803 — 21.6826 
Му, = 32,471,125.0039 
V, = 356.027422 


3649.09354 
ka = 50р = 12.12328 
89,074.61714 
ka = 90,300 = 98043 
р, 2 222795886.25505 — 133,158.83 
5 89,999 299 
= 364.40223 — 445.34726 
= —80.94503 
98643 
n = 495; = 02837 
—80.94503 
йз = 14697281 ^ 9907 


Variance of gı = 


Variance of д: = 


(6.04)(301) 1818.04 


(303)(805) ^ 92,415 = 019073 
S.E. of gı = 1403 


(24.16)(301)? 2,188,920.16 
(299)(305)(307) ^ 27,990,865 


S.E. of дг = .2796 


ls, 


1 


to, 


= 


02337 
1403 ^ 
—.5507 
2796 


166 ~P = .87; df = c 


—1.97 ~P < .05; d.f. = о 


= .078184 
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CALCULATIONS 


TABLE 40 


Honor-point ratios 3 4 
(H.P.R.) ga М 
—1.24 to —1.00 7 |—7.6*|—52.5| 393.75] —2953.125| 22,148.4375 
—0.99to — .75 11-7 -7 49 — 848 2,401 
-0.74% — .50 5|-б -80 180 -- 1080 6,480 
—0.49to — .25| 14|—5 —70 | 350 —1750 8,750 
—0.24to — .00| 24 |—4 —96 | 384 - 1586 6,144 
0.00 to .25| 22 |-3 -66 198 - 594 1,782 
0.26 to .50| 30|—2 —60 120 — 240 480 
0.51 to .75| 31|—1 —81 31 = S 31 
0.76to 1.00| 28| 0 0 0 0 0 
1.0146 1.25| 28| 1 28 28 28 28 
1.26to 1.50| 36| 2 72 144 288 576 
1.51to 1.75| 20| 3 60 180 540 1,620 
1.76to 2.00| 26| 4 104 | 416 1664 6,656 
2.01і 2.25 7| 5 35 175 875 4,375 
2.26to 2.50} 14] 6 84 | 504 3024 18,144 
2.51to 2.75 5| 7 35 | 245 1715 12,005 
2.76to 3.00 4| 8 32 | 256 2048 16,384 
Total 302 37.5/3653.75| 1654.875/108,004.4375 


* All cases in interval H.P.R. = —1.00. 


IN TESTING THE NORMALITY or A DISTRIBUTION BY THE USE 


ks 


L7 


OF THE j-STATISTICS 


= 27 an- دچ‎ 
= mr — 4XV; — 6X3 y, — X4 
_ NV: 
"Ao SH 
му 
| (N = 1)(N = 2) 
NUN +1) 3N? 
"w-nw-7w«-3'-«(-swN-3 7 
ба. kg 
= ke? {ът ke 
Variance of gı = буш =) 
(N — 2)(N + 1)( + 3) 
24N(N — 1)? 


Variance of 92 = Cy — Sar — OEE) 
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would not be likely to lead to a similar distribution. For example, 
measurements of the volume of an object might be found to follow a 
normal distribution whereas measurements of the diameter would not. 
Here the measurement of the volume would be the more convenient to 
deal with. Since the method of measurement giving a normal distribu- 
tion, if it exists, is not known a priori, it is not likely that the appropriate 
method will be selected to begin with. 

The second condition that is often indicated or implied as a necessary 
condition for the unfettered use of statistics is the stability or at least the 
predictability of the variance. Methods of measurement or of trans- 
formations giving normal distributions are of special significance when 
the standard deviation is large in comparison with the mean. In cases 
where the standard deviation is small, the effect of any transformation 
is less and, when it is very small, negligible. Both a necessary and 
sufficient condition for the independence of the mean and standard 
deviation in samples is normality in the parent distribution. 

We now consider the nature and use of various transformations 
designed to normalize or stabilize variates so as to render their distribu- 
tions more amenable to treatment by statistical methods based on these 
conditions. 

T-Score. In the field of educational psychology, McCall (Ref. 16) 
converted the raw scores on a mental test of an unselected group of twelve- 
year-old children to T-scores. This transformation gives a normal 
distribution of T-scores. The process is illustrated in the transformation 
of the raw scores of 141 freshmen on a science test (Table 41). 

In columns (1) and (2) the raw-score frequency distribution is given. 
Column (3) gives the cumulative frequency up to the mid-point of the 
respective raw-score units; for example, in row 1, N = 133 + 3(8) = 137. 
In column (4) the cumulative percentages are listed; for example, in 
row 1, 137/N — 137/141 — 97.13. 

The values recorded in column (5) were obtained from the table of 
areas and abscissas of the normal curve (Table I, Appendix). Thus, in 
row 1 the abscissa value of a point, such that 97.13 per cent of the total 
area under the normal curve lies below the ordinate erected at that point, 
is found from the table to be 1.90. 

The T-score values in column (6) are obtained by multiplying each 
abscissa value by 10 and adding 50 to the product. Thus, in row 1, 
10(1.90) + 50 = 69. 

The T-score unit is defined as one-tenth of the standard deviation. 
The mean of the distribution of T-scores is 50 and the standard deviation 
is 10. 

It is to be noted that measurements of the mental qualities of indi- 
viduals may be made so that their distribution will be normal within the 
limits of sampling error. This result can be obtained for a large unse- 
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TABLE 41 
TRANSFORMATION oF Raw Scores on JOHNSON SCIENCE APPLICATION Test оғ 141 
FRESHMAN STUDENTS TO 7-Scores 


Scores lower + i 
those at given score Values of abscissa 
Raw score Ў in standard T-score 
measure 
N Per Cent 
(1) (2) (3) (4) (5) (6) 
50 8 137.0 97.13 1.90 69 
49 8 129.0 91.46 1.37 6i 
48 7 121.5 86.14 1.09 61 
47 8 114.0 80.82 0.87 59 
46 8 106.0 75.15 0.68 57 
45 7 98.5 69.83 0.52 55 
44 6 92.0 65.23 0.39 54 
43 5 86.5 61.32 0.29 53 
42 z 80.5 57.07 0.18 52 
4l 6 74.0 52.47 0.06 51 
40 6 68.0 48.21 —0.04 50 
39 5 62.5 44.31 -0.14 49 
38 9 55.5 39.35 -0.%7 47 
87 7 47.5 33.08 —0.48 46 
36 7 40.5 28.71 —0.59 44 
35 33.5 23.75 —0.7 43 
34 7 26.5 18.7 -0.89 41 
33 5 20.5 14.53 —1.06 39 
32 5 15.5 10.99 —1.88 38 
31 6 10.0 7.09 —1.47 35 
30 4 5.0 3.55 —1.81 32 
29 3 1.5 1.06 --2.80 27 


lected homogeneous group of individuals usually by constructing a test 
e very easy items, some very difficult 


or examination comprised of som 1 , 
1 or intermediate difficulty. Of course, 


items, and many items of average етан 
а test can be pee to conform within limits to whatever shape of 


distribution is wanted by varying the difficulty of the test, the time 
allotment for administering the test, the system of weighting the scoring 
of items, the choice of the unit of measurement, and so forth. Further- 
more, even if the examinations yield results that are normal for a homo- 
geneous population, the same examination administered to a special 
group will likely give scores that are skewed, often as a consequence of 


> Д ^ xamination to the grou 
Selec қ appropriateness of the examination to the group 
tion or of the inapproP r some other type of distribution results 


tested. V normal o Х 
from the te used, it is obvious that whatever knowledge is 
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gained about the distribution, it concerns the distribution of the function 
of the trait used in the measuring process. This conclusion is valid 
because the measurement is indirect, that is, through the measurement 
of a functional relationship, the exact nature of which is unknown. Our 
measurements are only the manifestation of the underlying trait. The 
statement that the mental traits of man are or are not normally dis- 
tributed is unproved and unprovable. No amount of experimentation, 
for instance, could demonstrate that intelligence is normally distributed. 
Our knowledge of its distribution relates to the way in which the mathe- 
matical function we use in measuring intelligence is distributed. The 
frequency distribution of Binet I.Q's, for example, for a large homo- 
geneous population is generally held to be normally distributed. How- 
ever, even here the extreme lower end of the distribution of I.Q.’s is not 
normal, since there is an excess of individuals with low I.Q.’s (see Ref. 19, 
page 102). Thus he who makes a test proceeds by first assuming that the 
trait is normally distributed and then by deriving measurements which 
will conform to this model. When the raw scores for a particular sample 
are found to be skew, one means of normalizing them is to convert them 
to T-scores. 

Only when a trait is measurable directly can the true nature of the 
distribution of the trait become known. Certain biometrical measure- 
ments made on random samplings from homogeneous populations may 
be normal. Wechsler (Ref. 21) collected available data for 89 measured 
traits and abilities of human beings. Certain linear measurements, such 
as stature, length of extremities, the various diameters of the skull, and 
certain of their ratios like the cephalic index, were the only distributions 
which might be regarded as normal, although even among these there 
was often considerable asymmetry. 

The Use of Probits in Testing the Normality of Transformations. 
The best method of transformation to secure normalization must usually 
be determined by trial and error. The success of any particular method 
can be determined by the application of the standard statistical methods 
previously described. However, a simple graphical method is available 
which can be used to find out which transformations are successful and 
in what respects other transformations are not. The method was 
developed for dealing with toxicological and other dosage-mortality data, 
particularly by Gaddum (Ref. 14) and Bliss (Refs. 3 and 4). Their 
method, that of probits, is first presented in its use for testing the normal- 
ity of transformations. қ 

The probit is defined in terms of the normal equivalent deviation 
(N.E.D.), and is readily determined for any given percentage from the 
unit normal curve. The N.E.D. of a given percentage is the deviation 
(from the mean) equivalent to the given percentage of the area of the 
curve. In order to make all values positive, the probit is the value result- 
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ing from adding 5 to the normal equivalent deviation.! The probit 
values corresponding to given percentages can be read directly from 
Fisher and Yates's Table IX (Ref. 13). The graphical method consists 
in plotting the appropriate transformations of the observations as 
abscissas either on probability paper or against the corresponding probits 
as ordinates. If the individuals or experimental subjects vary in such 
a way that the measurements or transformed measurements of the experi- 
mental factor are normally distributed, the probit should be a linear 
surement or of its transformation. It is usually 
not the plotted points are randomly 
When they are so distributed, one can 
among the points to fit satisfactorily 


function of the mea: 
immediately apparent whether or 
distributed about a straight line. 

with practice draw a straight line 


for most practical purposes. E "P . 
It is possible to fit regression lines, and maximum likelihood estimates 


of the population parameter values of the mean and standard deviation 
can be obtained when more precise methods are needed. A straight-line 
probit graph fitted by eye provides the first approximation. Although 
graphical analysis is probably the most efficient method for selecting a 
suitable function, sometimes it is necessary to determine by computation 
whether a given transformation is effective or, alternatively, whether the 
departures from another mode of plotting deviate significantly from 
normality. The standard statistical tests for this purpose, the statistics 
gı and gs, have been discussed previously. . The first, gı, measures the 
skewness of the presumed normal distribution and determines whether 
or not the chief trend of the points is truly linear; 0% indicates whether 
the secondary trends and twists about the straight line are statistically 
Significant. With a small number of observations, only large departures 
from a straight line will be statistically significant. This pace м E 
have been recognized as obvious during graphic 252. so es the 
computation may then be seldom worth doing. porte W = hs 
number is sufficient for making grouping advisable раг me d e 
caleulations leading to the testing of the agreement be Ч [^ o се ers] 
and hypothesis may lead to results that are not prs zomin ре è A 
The principal use of the graphical method just ra : ^ EE e 
in its application to data to the percentages correspon a ot : = ues 
of the variable. However, the graphical ag рт 5 7 = ы; an 
When more complete information is available (Ref. Э. К we -— 5 
there are N observations of a given variable, one method 1 


bservation is assigned à percentage 
MC tosize. Thenthe smallest obse frat 
[U 


of 5 and to succeeding observations, percentages of gy ay” ° °’ 


N + 
Qn — 1)100, These percentages are then changed to probits and each 
2N 


тр in Р 58. 
1 Compare with the T-score, раве 158. 
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individual observation is plotted. When the data become sufficient, 
they are grouped and added cumulatively and the probits are then 
plotted against the points separating the groups. When the number of 
cases in a group is very small, it is preferable to plot the individual read- 
ings or to assume an even distribution of the observations over the range 
covered by the group. 

Again, if straight lines fit the data, the distributions are normal. The 
mean and standard deviation can be estimated fairly accurately from 
the graph. The reciprocal of the slope of the line gives the estimate 
of the standard deviation. The mean is the value of the abscissa for 
which the probit value (as ordinate) is 5. The customary technique for 
caleulating a regression line is not appropriate when the experimental 
results are of the kind just described. The best estimates of the mean 
and standard deviation are obtained by using the ordinary methods 
directly on the transformed observations. When the original observations 
are grouped, the most convenient method may be to estimate these 
statistics from the moments of the distribution. 

The method of probits also provides a general graphical method of 
normalizing distributions which may be applied when the scale on which 
the experimental results are measured is altogether arbitrary. If a 
smooth curve is drawn through the points of a random sample of observa- 
tions plotted against probits, the curve may be used to convert succeed- 
ing observations to a scale of probits. These transformed values are 
necessarily normally distributed. The validity of this procedure requires 
that the shape of the original curve and the variance of the transformed 
curve must be stable. An illustration of the application of this principle 
is given by Ferguson (Ref. 10) in his presentation of methods for the 
estimation of the limen and precision of separate items of a mental test. 
Finney (Ref. 11) applied the method of probit analysis to get the max- 
imum likelihood estimates of the two parameters from the data of 
Ferguson. 

The Logarithmic Transformation. Tt has been found that many 
moderately skew frequency distributions arising from empirical data or 
fulfilling certain theoretical conditions are reduced to normal curves when 
the original observations are transformed to X = log X. A logarithmic 
transformation of a variable may not only make the distribution more 
nearly normal but will often stabilize the standard deviation, that is, make 
it more or less independent of the original variable. This stabilizing 
tendency holds where it is found that the standard deviation of the 
original variable is roughly proportional to the mean, or where the vari- 
ance is proportional to the square of the mean. This fact makes the 
logarithmic transformation a powerful one. It has also been found 
useful in dealing with new material whose distribution is unknown 
(Ref. 6). 
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There is also the theoretical justification which indicates that the 
log transformation for most scientific observations is probably preferable 
to employing no transformation at all. The normal law may predict 
negative observations. The fact that there are men of more than double 
the average weight implies the existence of other men with negative 
weight. In case of scores of enlisted men on the Army Alpha Intelligence 
Examination, the measures M — 2 S.D. and M — 3 S.D. give the non- 
existent scores of —12 and —49. When logarithmie transformations 
of the observations are used, this difficulty does not occur. Measure- 
ments of the size of small bodies of the same shape may be based on the 
diameter or on the volume. If the distribution of the volumes is normal, 
that of the diameters will necessarily be skew, and vice versa. Again, 
the use of logarithms does away with the difficulty. If the logarithms 
of the diameters are distributed in a normal manner with a standard 
deviation A, the logarithms of the volumes will be normally distributed 
with standard deviation 34 (Ref. 14). 

The logarithmie transformation, then, should make easy the interpre- 
tation of experimental results when the variations are large. It fre- 
quently has a double advantage in making experimental results more 
consistent and in preventing excessive weight from being given to an 
occasionally large aberrant observation. Cochran (Ref. 6) indicated 
that the logarithmic transformation made no significant difference when 
the coefficient of variation was less than 12 percent. Natural logarithms, 
preferred by the mathematician, and common logarithms to the base 10, 
ordinarily liked better by the experimenter, give equally good results. 
Gaddum (Ref. 14) uses the symbol \ to denote the standard deviation 
of the logarithm to the base 10. It is worth noting that as a logical 
consequence of the adoption of the method of logarithmic transformation, 
the mean of the logarithms (or the geometric mean of the observations, 
instead of the arithmetic mean) would be regarded as the most probable 
value. T" 

Gaddum (Ref. 14) gives general formulas for obtaining the mean and 
standard deviation of the transformed distribution when the original 
observations have been grouped on an arithmetic scale. "These are 


X = logi (257) (7.17) 
( + P) 


77 = 0.4343 logio (: + я) (7.18) 


where X and с are the mean and standard deviation, respectively, of the 
original distribution. Gaddum points out that these estimates are 
reasonably efficient only when is less than 0.14 (Ref. 14), when an 


164 NORMAL AND NORMALIZED DISTRIBUTIONS [Снар. VII 


estimate of \ within 3 per cent can be obtained by dividing the coefficient 
of variation by 231. - 

Gaddum proposed to call the distribution of x “Jog-normal” when the 
distribution of log x is normal. He reports a number of studies which 
show that the log-normal distributions have been found in many fields of 
work. It is also indicated that its use could have facilitated interpreta- 
tion of data in certain studies in which difficulties were encountered. In 
Wechsler’s study (Ref. 21), for instance, the curves obtained for many 
of the measurements of human traits were just the kind which are 
improved by using the logarithmic transformation. Gaddum calculated 
the values of \ for some of Wechsler’s data. For example, the estimated 
N's for weight—0.045 and 0.055—are about three times the X's for height 
—0.015, 0.0164, 0.0172, 0.017. 

Muhsam (Ref. 17) proposes the use of a "*Jog-arith" grid for the study 
of relative dispersions of distributions. The log-arith grid is a system of 
rectangular coordinates in which the axis of abscissas is divided log- 
arithmically and that of the ordinates arithmetically. Generally, dis- 
tribution eurves showing equal broadness on a log-arith grid have equal 
relative dispersions. A broader curve indicates higher relative dispersion 
while a narrower curve shows a lower one, This form of graphic repre- 
sentation is particularly suitable in the case of log-normal distributions. 

The Square Root and Inverse Sine Transformations. The present 
extensive use of the analysis of variance attaches special significance to 
the usefulness of transformations when there is reason to suspect that the 
theoretical conditions for the application of this technique are not ful- 
filled. These theoretical conditions are that the experimental errors to ' 
which the experimental data are subject are normally and independently 
distributed with the same variance. The logarithmic transformation 
just discussed equalizes the variance when it is proportional to the square 
of the mean. Therefore, this transformation is powerful for dealing with 
quantitative measurements, and it is used as a preparatory step to an 
analysis of variance when dealing with certain types of nonnormal data. 
The main objective in the use of this transformation is to ensure that the 
standard deviation, as calculated from a residual sum of squares, shall 
be applicable to the various "treatment" means, even though the means 
are different. The lack of normality of the distribution of the residual 
errors as observed in practice may be of secondary importance. Curtiss 


ers have only recently considered the problem of including in the experi- 
mental designs for collecting this type of data an objective estimate of the 
experimental errors to which the data are subject. The analysis. of 


Cuar. VII] NORMAL AND NORMALIZED DISTRIBUTIONS 165 


Vi M Н H 

я E lance, uniquely fitted to serve this purpose, was not originally planned 

sins with percentages. The problem was one of discovering a trans- 

2 ation f or the original observations which would satisfy the condition 

eism А of experimental errors required in the analysis of variance. 
ransformation used for this purpose is know j ê 

2 purp own as the Znverse sine 


The inverse sine transformation applies to fractions or percentages 


tn from the ratio of two small integers, when the experimental 
Trors follow the binomial frequency distribution. Before an analysis of 
Хапапсе is performed, each percentage is changed to an angle 0 so that 
ж 0. As the fraction p varies from 0 to 1 or the observed per- 
Fs Pad P, from 0 to 100, the angle 0 changes from 0 to 90 deg. In large 
кетчү ез, the sampling variation of P tends to be normally distributed 
the a variance dependent only on the number of observations on which 
Fish percentage is determined. The variance on the new scale is 821/n. 

her and Yates, in Tables XII and XIII of Ref. 13, provide tables for 
converting percentages and fractions to degrees. 

For the sampling distribution of the estimated percentages or pro- 


Portions to be normal, the population value of p would be .50. For 
widely from .50, as between 0 and .25 


values of the parameters departing 

and between .75 and 1.00, the sampling distribution would be highly 
skew, For determining measures of sampling errors of such distributions, 
it is necessary to make а transformation of the observational values. The 


inverse s; m 
verse sine transformation 18 the one used here. 
Likewise, for comparing the differences between percentages, par- 
hen one is in the tail and the 


ticularly where they deviate widely, 48 Y 
other near the center of the distribution, the inverse sine transformation 
will render them more nearly comparable. Thus, the difference between 
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they have different variances. With small whole numbers, treatment 
differences must be large before they can be significant. Moreover, the 
larger the treatment differences are, the greater the inequality in their 
variances is likely to be. 

'The Poisson distribution is skew and hence there is a known relation 
between the standard error and the mean. The theoretical variance of 
the transformed values, +/2’s, is i. The purpose of the transformation 
is to change the data to a new scale in which the experimental variance 
is approximately the same for all plots, thus making possible the use of all 
in estimating the standard error of any treatment comparison. 

Normalizing Transformation for Ordinal or Ranked Data. In some 
types of experimental data, it may be possible or sufficient only to place a 
series of magnitudes in order of preference without knowledge of their 
metrical values. For example, in tests of psychological preferences, 
individuals may be able to express preferences but cannot assign numer- 
ical values to whatever forces may be operative in bringing about such 
preferences. Likewise, in the standardization of food products, an 
important factor is the determination of consumer preferences, which 
may be indicated by the ranking of a given set of products in order of 
choice. 

Where the assumptions underlying the order of ranking are fulfilled, 
namely, the assumptions that the underlying trait may be regarded as 
continuous and normally distributed, the transformation of ordinal data 
to a form that is amenable to further analysis (for instance, to the analysis 
of variance) sometimes may be definitely advantageous. The trans- 
formation needed is one which normalizes the data and can be obtained 
by assigning to each item in a series of given size a score equal to the 
expected value for an observation of corresponding rank in a normal 
population with zero mean and unit standard deviation. Tables have 
been prepared for series of all sizes from 2 to 50 items. Such a table is 
given by Fisher and Yates's Table XX (Ref. 13). Table XXI in the 
same source provides the sum of squares for the transformed score of 
each individual, substantially reducing the labor involved in running the 
analysis of variance. This type of analysis makes possible tests of 
differentiation in preference between classes of subjects of different sex, 
age, or other characteristics. 

Bliss (Ref. 5) gives a complete description of the technique for trans- 
forming ranks and of its application to a problem of testing consumer 
preferences. Sandon (Ref. 19) has prepared a nomograph for the scoring 
of rank data on school examinations. 


PROBLEMS 


1. Set up a list of statistical tools that depend for their efficiency upon 
the fulfillment of the conditions of normality of the measurements of 
the trait or characteristic in the population sampled. 
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2. What are the effects of nonnormality on the validity of tests of signifi- 
cance—the z or F test, the two-tailed ¢-test, the one-tailed t-test? 

3. Test the hypothesis of normality of the following distribution of — 
on the factual information test of the 1947 Minnesota State Board 
Examination in Biology administered in a representative sampling of 
56 Minnesota high schools (Anderson, 1949). Use the method of 


Pearson. 


Score Frequency Score Frequency 
25 1 12 173 
24 3 11 159 
23 24 10 129 
22 26 9 109 
21 73 8 49 
20 90 7 28 
19 122 6 18 
18 179 5 11 
17 206 4 6 
16 227 3 1 
15 255 2 1 
14 218 d 002 0 2 
13 240 'Total 2,348 


4. Test the hypothesis of normality of the following distribution of 
first-quarter honor-point ratios of a random sample of 122 students 
in the College of Agriculture of the University of Minnesota. Use 


the criteria of Fisher. 


H.P.R. Frequency 
2.76 to 3.00 1 
2.51to 2.75 2 
2.26to 2.50 9 
2.01 to 2.25 3 
1.76to 2.00 5 
1.51to 1.75 8 
1.26to 1.50 13 
1.010 1.25 14 
0.76 to 1.00 11 
0.51to 0.75 14 
0.26 to 0.50 14 
0.00 to 0.25 11 
-0.22ш 0.00 9 
—0.49 to —0.25 3 
—0.74 to —0 50 2 
—0.99 to —0.7 1 
—1.24to —1.00 2 

Total 122 


5. U: hical method involving the use of probits for testing the 
so the gap -point ratios in Problem 4. 


normality of the distribution of honor: 
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CHAPTER VIII 


STATISTICAL ANALYSIS OF DATA UNDER NONNORMAL 
ASSUMPTIONS 


f data met with at times, particularly in psychology, consists 
of rankings which may arise from material not capable of quantitative 
measurement on a variate scale but arranged in order according to some 
qualitative characteristic. This might be, for example, the problem of 
arranging musical compositions in the order of preference by a group of 
students. Another problem consists in ranking according to two vari- 
ables: the arrangement of a set of musical compositions in the order of 
preference by a group of professional musicians and by a group of lay- 
men. The relationship between the two sets of rankings is of interest. 
Another type of data in this field would be produced by having a judge 
rate individuals on a five-point scale according to some trait. Trans- 
formations of these types of data are sometimes made. For example, the 
ranked data may be transformed into normally distributed data as 
described in Chapter VII. In another method the ranked data are 
distributed into groups, 80 that the frequencies in the various groups 
follow the normalscale. Scores on & linear scale are then assigned to the 
groups. Further statistical treatment usually follows, such as computing 
the product-moment correlation coefficient, using multivariate analysis 


or factor analysis. 


A type o 


Before deciding to ma ations, the critical investigator 
will examine his data and the conditions under which they were collected, 
to determine whether the assumptions underlying the transformation 
can be reasonably accepted. He may find that they cannot be and hence 
decide that a transformation is not warranted. There are a number of 

;hich do not require the assump- 


simpler statistical methods available w 
tions of the more elaborate methods suggested above. They enable a 


direct attack to be made on the data. Some of these methods will now 
be pointed out, particularly those whose oper has been enhanced 
by the development of means for testing their signi "mre E 
The Method of Rank Correlation. The rank-corre pupe ma od as 
developed by Spearma: ell known. It is recommen ed, however, 


ke such transform 


is W ev 
aiba for it in elementary texts on statistics 
қ sists іп assuming that the Spearman’s rho 


is abandoned. reir the product-moment correlation coefficient 
ay be used as а substi ily given to obtain the product-moment equiv- 


by the aid of tables usu? жүй | 
т-2віп rJ gives the relation- 


Pearson 


alent, The formula due to B- 
169 
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ship between the product-moment coefficient, r, and the rank correlation, 
p, when the variates are normal. The assumptions underlying the 
equivalence, that there are no ties in rank and that the intervals between 
successive ranks are equal, are not likely often to be found in practice. 
'The use of the rank correlation given here is that as a test of significance. 

The Rank Correlation as a Test of Significance. Recent contributions 
to our knowledge of the rank correlation enable us to use it effectively 
as a test of the existence of correlation, that is, to test the hypothesis 
that the qualities under consideration are independent, or rather, that the 
judgments of them are independent (Ref. 4). Under such conditions, the 
pairs of rankings of n members drawn at random are independent. Thus, 
for large numbers of samples, every ranking of one quality will occur in 
equal frequencies with every ranking of another quality. If one ranking 
is fixed in the order (1, 2, . . . , n), it may be correlated with the n! 
possible permutations of these members. Thus, the exact probability 
that any correlation result could be due to random sampling errors can 
be calculated. 

This method of the calculation of probability values becomes laborious 
and practically prohibitive when n is of any substantial size. Olds 
(Ref. 10), however, has provided tables which give probability values 
to a close approximation. He tabled the probability values based upon 
the distributions of Х(4%. Тһе latter is simply related to 7’, the rank 
correlation, by the equation 
6z(d?) 
n?—mn 


а] — (8.01) 

The rank correlation is of special value in testing significance when 
there is no knowledge of the form of the bivariate distribution or in the 
case where the form of the distribution is, or is believed to be, non-normal. 
It should be pointed out that scarcely anything is known about the sig- 
nificance of rank correlation in correlated populations. 

Problem ҮШ.1. Testing the significance of rank correlation. Ап 
example is presented to illustrate the test of significance of a rank corre- 
lation coefficient by means of Olds’s Tables. 

The ranks Ёз and Fi» were assigned to 12 individuals with respect to 
two qualities with the results shown in the table at the top of page 171. 

We enter Table V (Ref. 10, page 148) with n = 12 and Sd? = 94; we 
find the probability of not exceeding 94 by chance is between .02 and .01. 
Therefore, we may conclude that there is a correlation between the two 
rankings. 

Problem VII.2. Combination of the information from two tests of 
significance. Another use to which the rank difference correlation may 
be put is the combination of rank and contingency methods suitable for 
utilizing simultaneously two kinds of information contained in group 
data. Table 42, concerning first-year students entering one of the 
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of Minnesota, gives the number of those who 


thematics and those who presented 
t the various levels of rating on the 


colleges of the University 
offered two credits in high-school ma 
more than two credits in mathematics а 
College Aptitude Test (GAT) 


TABLE 42 


FRESHMAN STUDENTS CLASSIFIED ACCORDING TO COLLEGE APTITUDE RATING AND THE 
x Нісн-Ѕсноо, MATHEMATICS 


NUMBER or ENTRANCE CREDITS I 


College aptitude percentile rating 
Units “ што | тош 
a atics 
— 1-25 26-50 | 51-75 76-100 
(а) Two years... 67 103 176 127 475 
(b) More than two years...‘ 27 25 39 20 111 
ТЕ м тө | .805 | .S19 | .864 | -812 
(a) + (b) 3 4 
Rande on QUAD... ates Heer ns 1 2 Я = 
Rank of the proportion... 1 2 3 ' = +1 


е, independent of each other, are applied to 
2, and the rank-correlation coefficient, 77. 
of the principles of classifica- 
Р = 046. The rank-differ- 
C.A.T. as one variable and 


Two tests of significance 
the data: the chi-square test, x 
The chi-square test of the independence 
tion gives the following results: x° = 8.118, 
ence correlation, 77, between the two series, 
the proportion + as the other variable, gives ” = +1, P = .042. 


aggregate of these two tests is significant, we have 


To test whether the 
the following data: 
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B — log, P Degrees of freedom 
.046 3.0791 2 
.042 3.1701 2 
Total 6.2492 4 


2 = 2(6.2492) = 12.4984; ~ .01 < P < 02. The probability of the 
hypothesis of independence of college aptitude rating and the number of 
units of high-school mathematics taken (two or more than two) is approxi- 
mately .014, by interpolation. Interpolation in the x?-table for 4 d.f.: 


P x? logi; P 
.02 11.668 2.30103 
014 12.498 2.14600 
.01 13.277 2.00000 


Problem VII.3. Analysis of variation by the method of ranks. 
Friedman (Ref. 1) has developed the method of ranks which was designed 
to study variation by using ranked data instead of the original quanti- 


TABLE 43 
Ranxs or PERCENTAGES OF COLLEGE ATTENDANCE FOR SPECIFIED LEVELS or COLLEGE 
APTITUDE AND OF SOCIOECONOMIC STATUS 


Ranks based on percentage of college attendance by 
College socioeconomic status 

aptitude 
intervals 


Below 15 | 15-18 19-22 23-20 | 27-30 | Above 30 


100 
90-99 
80-89 
70-79 
60-69 
50-59 
40-49 
15-39 
(a) Sum of ranks. 5 5 5 
(b) Mean rank... 5.063 .313 2.686 
(c) Deviation....| 1.563 0.813 —0.812 
(d) Deviation 


f- squared. ...| 2.442969| 0.660969 0.669344| 0.316969| 0.0625| 5.640625 
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Theoretical mean = 8.5. Sum of deviation squared = 9.783376. 


Sub O‏ ج 
tative values, avoiding the assumption of normality in the original data.‏ 
The method can also be used where the available data relate to order‏ 
only or to à qualitative character capable only of being ranked. This‏ 


— -n 5 06————————, 
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method makes use of the statistic x2, which is related to Kendall’s coeffi- 
cient of concordance W (see page 174) as follows: 


xi = т(п — 1)W (8.02) 


The distribution of x? tends to me the distributed x?, as n tends to 
infinity, with (n — 1) degrees of freedom. Some significance levels of 
x? have been provided (Ref. 1). 

The example given in Table 43 shows the procedure of the method of 
ranks. The data are given by Schultz (Ref. 13). 


obtained by arranging in ascending order the 


(1) The ranks were 
school graduates for each row (the 


percentages of male high- 
college aptitude levels). 
(2) The next step was to o 
given in line (b). 
(3) The third step was 
rank for each column an 
where p is the number of ranks. 
(4) The sum of squares of the differences in (3) was obtained. 


(5) Then x? was found as follows: 


btain the mean rank for each column 


to obtain the difference between the mean 
d the theoretical mean 3.5, i.e., $(p + 1), 


р „п 
12 я 
з= — ту) — Зп (р + 1 .03 
Е уа 
jal ізі 
nk entered in the ith row and the jth column; т 


where туу is the ra 


is the number of ranks averaged. Thus: 


128) E 
x = guy 0788979 22.365 


entered with 5 degrees of freedom. 

an .01, it was inferred that there was a significant 
ocioeconomic status and college attendance 
ntrolled. 

leulating the statistic, n., the 


(6) The x?-table is 
(7) Since P is less th 
association between 8 
where college ability was 60 


Wallis (Ref. 14) gives а formula for ca 


< 


rank correlation ratio: 
‚ _ Pip + 1)x2/12 _ x 
7 ppp? D т(р-1) қ 
22.365 
from which "7 8(6—1) = .5591 (8.04) 
and е nr = -15 


Finally, the value of .75 is an estimate of the rank correlation ratio 
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between socioeconomic status and percentage of college attendance when 
college aptitude is controlled. 

The Case of Multiple Rankings. The problem arises in practice of 
how to determine the agreement among a number of rankings and how to 
obtain an estimate of a true ranking if a significant concordance among 
sets of rankings exists. This is the case when there are m rankings of n 
instead of two. For instance, a group of students might be asked to 
arrange the photographs of a number of persons unknown to them with 
respect to their judgments as to the unknown persons’ intelligence. 1% 
is desired to test whether there is a community of judgments between the 
students. Of course this experiment is not equivalent to determining a 
relationship based on order of experimental findings. There could be a 
substantial agreement about an incorrect order which might be different 
from the one established by the score of a valid and reliable intelligence 
test. 

Problem VIII.3. Computing and testing the significance of the 
coefficient of concordance. Let the following represent the rankings of 
three observers of 8 objects, Жа, oos cay Big 


Objects 


Observer 


The sum of the sum of the ranks of the columns must be 108, that is, 
mn(n + 1) ы 
р 7 where m is the number of observers and n the number of 
objects. If the concordance were perfect the sums would be 3, 6, 9, 12, 
15, 18, 21, and 24, though not necessarily in that order. If there is 
little or no agreement, the sums are approximately equal. The variance 
of these sums gives a measure of the ranking concordance. 

Kendall (Reference 9, page 411) derives a coefficient of concordance, 
W, as 

128 
= m?(n? — т) (8.05) 


where S is the sum of the squares of deviations from the mean, m(n + 1)/2. 


If agreement is perfect, then the sums of the columns are m, 2m, . . . 5 
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nm and the sum ог S is m*(n3 — n)/12. The range in values of W is 
from 0 to 1. 
In the example above, 


Mean = nr Ded 7385 D. 13.5 
S = (18 — 13.5)? + (8 — 13.5)? + (4 — 13.5)? + (19 — 13.5)? 
+ (15 — 13.5)? + (11 — 13.5)? + (19 — 13.5)? + (24 — 13.5)? 

= 320.00 
W- 12(320) 

9)83 — 8) 

= .85 

To test the significance of an observed value of W, it is essential to 
determine the distribution of W (or, more conveniently, of S) in the 
population, which is obtained by permuting the n ranks in all possible 
ways in each of the m rankings. Kendall (Ref. 6) gives the distribution 
for some low values of 2 and m and indicates how to approximate for large 
values through the use of a continuous distribution. The latter can be 


done by the use of the z-distribution where 


T (m — 1)W 
2-3 log, WC (8.06) 
and n= ба 2 (8.07) 


ve = (m — 1) ІС эйе 2) (8.08) 


it is desirable to apply the 


In making this test for low values of m and n, 
ation (8.05) by unity 


usual correction for continuity by reducing S in Equ 
and increasing the divisor by 2. | 

We shall illustrate by testing the sig 
Wo = .85: 


nificance of the obtained value, 


(320 — 1) _ gy 
Wi- 44-2 = .84 
11 (2-84) у 
275 log. j— 84 1.1759 
n = 2, V; = $8 
Forv = 6 and ys = 18: 2,001 = 1.0306. 


Hence, for z — 1.1759 P « .001. 

The estimate of the true ranking of the objects is intuitively given by 
taking as rank 1 that object whose sum of ranks is the least. In our 
problem that object is As, followed by objects Аз, Аз, As, As, Ал, Ay, and 
As This ranking is obtained by rearranging the 8 totals in rank order. 
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This solution is given a firmer theoretical basis by showing that it is 
“best” in a least-squares sense. If any two of the S's are equal, this 
method is indeterminate, and priority would be given to the object 
which has the lesser sum of squares of ranks. Where two objects have 
the same set of ranks, the specific ranking of each can be decided by 
tossing a coin or by selecting the ranks in a way most unfavorable to the 
hypothesis under test. An alternative solution might be obtained by 
splitting the ranks, giving each of the doubtful objects the same rank. 
"This method, however, introduces severe theoretical difficulties in making 
tests of significance. 

The Method of Paired Comparisons. In the method of paired com- 
parison, the observer compares each object with every other one. He 
indieates which object in a pair he prefers. "This method was developed 
in psychology in the late 1890’s. Its use, however, was limited to that 
of a descriptive statistic. Recently, statistical methods have been 
developed for testing the consistency of an individual's comparisons 
and also of the agreement between observers or judges. "These develop- 
ments should enhance the value of the method for research purposes, 
particularly for the situations for which it has a unique value. In rank- 
ing, for example, if the quality under consideration is not measurable 
on a linear scale, the resulting ranking may give not only a faulty presen- 
tation of an observer's preference but also of the variation of the quality 
in the individuals. Thus in judging preferences in musical composition 
it is not unlikely that an auditor would judge A as preferable to B, B to C, 
and C to A. “Inconsistent” preferences of this kind could not occur in 
ranking, since, if A is placed above B, and B above C, then A is auto- 
matically placed above C. Cases also arise in which the judgments of 
untrained individuals are wanted who might be capable of comparing 
pairs of individuals with respect to some quality but would not likely be 
able to rank all the members of even a relatively small group. In animal 
experiments or in experiments with very young children, for example, in 
determining food choices, rankings would not be possible. But paired 
comparisons could be used by presenting the food in pairs and noting 
which food was eaten first. 

Coefficient of Consistence in Paired Comparisons. Kendall (Refs. 6 
and 8) gives a method of deriving a coefficient of consistence which 
indicates how consistent a judge or observer is in making preferences. 
If an individual observer produces a configuration of inconsistent prefer- 
ences, the reasons may be that (1) he is incompetent to judge, (2) the 
differences among the objects may be too small to detect, (3) the attention 
of the observer may wander during the experiment, (4) the quality under 
comparison may not be representable by a linear variable. 


With n objects, each of the possible pairs, (9, is presented to the 
в r 
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subject and his preference of one member of the pair is noted. If the 
object A is preferred to B, it may be indicated as А — B. In general, if 
an observer makes preferences of the type A > B— Сә D—E-—F 
. .. there is no inconsistency, and this case corresponds to ordinary 
ranking. The criterion of inconsistence is the "circular" triad. If the 
n objects are considered as the vertices of a regular polygon of т sides 
and each vertex is joined with every other one, the direction of the choice 
can be indicated. Thus, if A is preferred to B, the symbol in the diagram 
is А — B. Any triangle in the figure in which the arrows all point in the 
same direction is a “circular” triad. Thus, if an observer makes pref- 
erences of type A > B — C — A, the triad ABC is said to be inconsistent. 

Kendall (Ref. 6) proved that the maximum possible number of 


3 — : $— А А 
circular triads is = 24 ? if n is odd and z 2i a if n is even; the smallest 
If d is the number of circular triads in an observed 


he defines t, the coefficient of consistence, as 


number is zero. 
configuration of preferences, 


=l- жы x (n odd) 
ш (8.09) 


(n even) 


it is observed that t is unity when there are no 
inconsistencies in the configuration. Asthe coefficient decreases to zero, 
the inconsistence, as determined by the number of circular triads, increases. 

The next problem is to determine the statistical significance of f, that 
is, to answer the question: With what probability can an obtained value 
of t arise by chance if the judge assigns his preferences at random in 


relation to the quality under examination? 


With n objects, the num 


is Y), Kendall discusses the procedure of in 
n 

of d in thi ulation of 2\2/ different members, namely, the method ai 

ee ВРЕ ion of n to that for (n+ 1). He gives 


proceeding from the distributi а ec PC 
tables xd the frequencies and probabilities for the distribution of d 


for n up to and including 7. E 
Coefficient of Agreement for m Observers. Kendall (Ref. 6) derived 


a. coefficient of agreement in which the judgments of m observers are 
obtained by the method of paired comparisons. The coefficient u is 


given by 
; 2 ) 
i == 


is 290 


From these equations, 


ber of possible configurations of preferences 


vestigating the distribution 


1 (8.10) 
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where m = the number of observers, n = the number of objects judged, 


2 = total number of agreements between judges: 


| 5-2 өз» 


where y is the number in each cell. 

The coefficient of agreement, u, is unity if and only if there is unani- 
mous agreement in the comparisons. Its minimum value is —1 only 
when m — 2. Kendall gives tables which enable one to make an exact 
test of significance of u for the following values of m and nim —3,n—2 
to8;m=4,n —2100;m = 5,n = 2405; т = 0,n = 2404. He has 
also demonstrated that the x*-approximation provides ап adequate test 
of significance for values of m and n outside the range of the tables. 


The expression 
T т\т — 3 4 
D - ысу == (812) 
7 


is distributed as x? where № = fe with 


— Мат a degrees of freedom (8.13) 


Problem VIII.4, Calculating the coefficient of agreement. 


A class 
of 67 ninth-grade boys were asked to state their preferences w 


ith respect 


TABLE 44 
PREFERENCES оғ 67 Nintu-Grave Boys IN 9 бсноот, Sunikcvs* 

Subject 1 2/3 4 5 6/71/18 9 | Totals 

1. Physical Education. ....., :: | 41 | 55 | 56 | 58 | 56 | 58 57 | 62 443 
2. Industrial Arts... | 26 | .. | 57 | 55 | 57 56 | 54 | 60 | 63 428 
3. Literature. -| 12 | 10 | .. | 28 | 36 | 38 | 36 40 | 60 260 
4. Mathemati -| 11 | 12 | 39 | .. | 29 34 | 40 | 37 | 51 253 
5. Social Studies.. -| 9/10] 31] 38] .. |34 40 | 40 | 51 253 
6. Science...... -| 11 | 11 | 29 | 33 | зз |. - | 36 | 43 | 53 249 
7. Spelling.. -| 9 | 13 [31 | 27 | 27 31 | .. | 34 | 48 220 
6, АШ Boe occus .| 10 T | 27 | 30 | 27 | 24 33 | .. | 47 205 
9. Composition............. 5 4 7 | 16 | 16 | 14 | 19 20 | .. 101 
"Total 2412 


* This table is read by considering the subject at the left of 


each row as being pre- 
ferred y times over the subject at the top of the column which locates any particular 


square, where y is the number in that Square. For example, Physical Education is 
preferred by 41 boys over Industrial Arts. 
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to 9 school subjects. Each boy was asked to place an X in front of the 
one member of each of the 36 pairs of subjects which interested him more - 
when he studied it. The preferences are shown in Table 44. The 
problem is to determine the similarity of preferences among the boys. 
The measure of agreement is the coefficient of agreement as given in 
Equation (8.10). 

The calculations required are as follows: The calculation of 2 as 


J 
given by Equation (8.11) can be shortened when the objects are arranged 
in order of total number of preferences by using the following relation: 


Ў = J -m Xo+ Files (8.14) 


where the summation is now carried out over the half of the table below 
the diagonal. The numbers in this half being smaller than those in the 


other half, the arithmetic is simpler. 
z(y) = 258 + IF · ° 20* = 17,914 
Sy = 26 + 12 ۰ · · 20 = 712 
mZ(y) = (67)(712) = 47,704 


2 2 
9x8. 
бав 
m\ (n\ _ 36) = 79,596 
(2) £u (2211) (36) 


Hence, u= 


To test the significance of u, we calculate x? according to Equation 
o test the 


(8.12). Thus: 
1 /9) (67 Red £  - 653.57 
[19,806 - z (2 2 )б7—2]67—2 


is distributed as x? with 


ч 67(66) 
v= (67 = 2 
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or 37.7 degrees of freedom. ‘The large value of v justifies the use of the 
normal approximation to the x?-distribution. Then 


VES – VE =1 = 42 


This is a highly improbable result on the hypothesis of a random assign- 
ment of preferences. Therefore, the coefficient 0.2515 is statistically 
significant. It may be concluded that there is a certain amount of agree- 
ment, though not a strong one, among the ا‎ in their preferences for 
school subjects. 

Problem VIII.5. Measuring the nes of choices by use of 
paired comparisons. The distribution of circular triads of a random 
sample of 15 ninth-grade boys and the coefficients of consistence for 
preference in school subjects calculated from Equation (8.09) were as 
follows: 


Student Number d t 
1 0 1.000 
2 0 1.000 
3 0 1.000 
4 0 1.000 
5 0 1.000 
6 0 1.000 
7 0 1.000 
8 0 1.000 
9 1 0.967 
10 1 0.967 
1 1 0.967 
12 1 0.967 
13 3 0.867 
14 3 0.867 
15 8 0.733 


For 8 of the boys, there were no circular triads. Therefore, the coeffi- 


" " 24(0 
cients were 1,000; that is, ¢ = 1 — 3 s — 1.00. For the remaining 


7, there were 4 coefficients of value 0.967, 2 of 0.867, and 1 of 
0.733. 

It may be concluded that these students were able to give a consistent 
set of choices of school subjects by use of paired comparisons. The 
reader is invited to validate these conclusions by making the appropriate 
tests of significance. 


PROBLEMS 
1. Before an examination, a teacher ranked her class of 25 students 
according to their expected achievements. After the examination, 
the rank was determined according to total score. What can be said 
about the teacher’s estimation of the abilities of the students? 
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з Exami- 
Student "Teacher's nation 
rank Е 
rank 
a 1 5 
b 2 1 
ac 3 9.5 
d 4 22 
e 5 4 
f 6 16.5 
g 7 11.5 
h 8 19 
i 9 9.5 
j 10 21 
k 11 7.5 
1 12 24 
m 13 14 
n 14 7.5 
o 15 2 
p 16 3 
q 17 25 
r 18 6 
в 19 16.5 
t 20 15 
u 21 20 
v 22 23 
w 23 13 
x 24 18 
y 25 11.5 
nM 


o tests of significance, the chi-square 


2. Combine the information from tw 
cient applied to the data in Problem 


test, and the rank correlation coeffi 


11, Chapter V, page 100. 
ts the rankings of 5 students based 


3. The following tabulation represen 1 st 
on their preferences for four different musical compositions: 


е E ج‎ 


Composition 
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(a) Compute the coefficient of concordance and test its significance. 
(b) If a significant concordance among the sets of rankings is found, 
combine the rankings to obtain the estimate of the true ranking. 
4. The following data represent the rankings according to interests in 
high-school subjects of a random sample of 28 boys in the eleventh 
grade. The rankings were obtained by three different methods: (1) 
paired comparison, (2) order of merit, and (3) rating. 


Ranks of subjects 


Method 


ү Soc. |с. 
Ew n Lit. | Math. Ser Sci. | Spell. | Art | Comp. 


1 2 |3 4 5 6 7 8 9 
1 2 |4.5| 4.5 3 6 7 8 9 
2 1 |4 6 3 5 8 T 9 


(a) Test the significance of the difference in ranks by the three 
methods. 

(b) If a significant association is found, estimate the amount of 
association among the three methods. 


5. The following tabulation shows the preferences of 67 ninth-grade girls 
in 9 school subjects: 


E 
Subject 1 2 3 4 5 6 Ж 8 9 | Totals 
1. ТАЙ... с coim ud .. | 33 | 41 |41 | 45 | 48 | 51 | 56 | 60 375 
2. Home Economies........| 34 | .. | 38 | 88 | 41 | 48 | 50 | 50 59 358 
3. Physical Education.. | 26 | 29 | .. | 28 | 34 | 40 | 46 | 53 | 58 314 
4. Spelling......... f| 26 | 29 | 39 | .. | 34 | 38 | 45 | 46 | 48 305 
5. Mathematics.. | 22 | 26 | 33 | 33 | .. | 39 | 45 | 45 | 41 284 
Bs А ұғ meena ve -| 19 | 19 | 27 | 29 | 28 | .. | 42 | 43 | 44 251 
7. Social Studies. «af 16 | 17 | 21 | 22 | 22 | 25 | .. | 836 | 36 195 
8. Composition... P | 11 | 17 | 14 | 21 | 22 | 24] 31] .. | 32 172 
9: BUCO: noce rh шылы 17181 9|10 |26 | 23 | 31 | 35] .. 158 
Total 2412 


(a) Compute the coefficient of agreement и. 
(b) Test the significance of u. 
(c) Compare the value of u for girls with the value of u for boys from 
the same school given in Table 44. 
6. Construct, administer, and analyze the results from a test designed to 
measure the attitude and its intensity of a specified population toward 
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. Olds, E. G., “Distributions of Sums 0 


- Scheffé, Henry, "Statistical Inference 


- Wilks, S. S., “Order Statistics,’ 


some pressing educational issue. (Consult Guttman, Louis, and 
Suchman, Edward A., "Intensity and a Zero Point for Attitude 
Analysis,” American Sociological Review, Vol. 12 (1947), pp. 58-67.) 


References 


. Friedman, Milton, “Тһе Use of Ranks to Avoid the Assumption of Nor- 


mality," Journal of the American Statistical Association, Vol. 32 (1937), 


pp. 675-701. 

Guttman, Louis, “An Approach for Quantifying Paired Comparisons and 
Rank Order,” Annals of Mathematical Statistics, Vol. XVII (1946), pp. 
144-163. 

, “A Basis for Scaling Qualitative Data,” American Sociological 
Review, Vol. IX (1944), pp. 189-150. 

Hotelling, Harold, and Pabst, Margaret R., “Rank Correlation and Tests 
of Significance Involving No Assumption of Normality,” Annals of Mathe- 
matical Statistics, Vol. VII (1936), pp. 29-43. 


. Kendall, M. G., *A New Measure of Rank Correlation,” Biometrika, Vol. 


XXX (1938), pp. 81-93. 
, The Advanced Theory of Statistics, Vol. 1. London: Charles Griffin & 


Company, Ltd., 1945. кек 
, Partial Rank Correlation,” Biometrika, Vol. XXXII (1942), p. 277. 

, and Smith, B. Babington, “On the Method of Paired Comparisons,” 
Biometrika, Vol. XXXI (1939), рр. 324-345. 
, and , “The Problem of m Rankings,” Annals of Mathematical 
Statistics, Vol. Х (1939), рр. 275-287. : 

f Squares of Rank Differences for Small 
Mathematical Statistics, Vol. IX (1938), 


Numbers of Individuals," Annals of 


. 133-149. 
cai са as a Test of Random Order,” 


" a Я “ { Inversions 
Rosander, А. C., “Тһе Use 2 tion, Vol. 37 (1942), pp. 352-358. 


Journal of the American Statistical Associa’ 
4 in the Non-Parametric Case,” Annals 


Vol. XIV (1943), pp. 305-332. 


of M ical Statistics. - | 
mage | Developments in the Statistical Analysis of 


“ 
. Schultz, Frank G., “Recent ducational Research," Journal of Experimental 


Ranked Data Adapted to E 


Education, Vol. XIII (1945), рр. 149-152. 


Ratio for Ranked Data,” Journal of the 


- Wallis, W. Allen, “Тһе Correlation 1 ol. 34 (1939), pp. 533-538. 


y ` tical Association, 4 А 
отне келе ^ Bulletin of the American Mathematical 


Society, Vol. 54 (1947), PP- 6-50. 


CHAPTER IX 
SAMPLING THEORY AND PRACTICE 


We shall now attempt to make available to the reader some of the 
results from investigations about sampling from the point of view of their 
use in the construction of clearer, more concise, and better-organized 
designs of sampling surveys and experiments. It is expected that the 
reader will become able to extend and deepen his knowledge of sampling 
principles by further reading of the more technical accounts and to apply 
his knowledge to the particular scientific problems in which he is inter- 
ested. Although our chief interest here is in the empirical or observa- 
tional parts of applied statistical science, the theoretical part previously 
developed is basic. Here, as elsewhere in science, both the theoretical 
and empirical parts are essential: the progress of a science is dependent 
on their reciprocal influence and simultaneous advancement. 

The theoretical part of science is, presumably, based on exact ascer- 
tainments, and its purpose is to develop the structure, relationships, and 
results of hypotheses. The appropriateness and applicability of a con- 
ceptual model involve the confirmation or refutation by observation 
of the hypotheses which enter into the model. The hypotheses must be 
changed if they are not supported by experience and observation. An 
adequate scientific methodology evolves through comparisons and evalua- 
tions of scientific theories, both from the standpoint of their essential 
parts and their efficiency in practice. The more explicit the theory is, 
the more amenable it becomes to the detection of errors or deficiencies 
that it may possess. 

Observation is the basic process of empirical science. The empirical 
side of science obtains, criticizes, and systematizes the observations. 
It unites the observations with the theoretical propositions and in this 
process may reject the hypotheses of the theory, if found necessary. It 
should be remembered, however, that the empirical side of science is also 
directed by hypotheses. The specification of the conditions under 
which the observations are to be made and the form in which they are 
to be collected are governed or guided by theory. Within the reciprocal 
relationships, it is probably mutually advantageous that the speculative 
and the observational sides of science should work somewhat inde- 
pendently, each by its own special method. 

Although statistical theory has been concerned chiefly with random 
sampling, considerable resourcefulness, based perhaps chiefly on common 
sense and intuition, has resulted in the development of new 


and effective 
184 
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systems of sampling designs. Much study is being given to the develop- 
ment of needed statistical theory basic to estimating the relative efficiency 
of different systems of sampling. Sampling is an excellent illustration 
of the link between theory and practice and of how difficulties are dis- 
covered and resolved as they arise in the problems met with in experience. 

From an early date, governments have engaged in the collection of 
statistics of population, commerce, production, consumption, prices, 
wages, income, and, more recently, with problems of social need and 
human welfare. Hence, statistics was originally political arithmetic to 
a great extent. The standard method for the collection of these statistics 
has been complete coverage and enumeration, of which the classical 
on census. Theoretically, at least for those 
hich remain relatively constant, this pro- 
cedure appears to be the best. But such an undertaking is costly, 
difficult to plan and conduct, limited to a relatively few items of informa- 
tion, is time-consuming, and is liable to be out of date by the time the 
results are published. In fact, the government even with its great 
resources and facilities, can carry on complete censuses only at rather long 
intervals. The exigencies of the World War II required the collection 
of many types of data which could only be done by the use of sample 
surveys. It is also worth noting that other governmental investigations 
had at various times resorted to sampling. In the 1940 census, for 
instance, the Bureau of the Census was able to broaden the scope of its 
inquiries by including à set of supplementary questions which were 
answered by a sample of 1 person in 20. Special sampling surveys for 
securing statistical information are now often made by unofficial agencies 
and by private individuals, usually to provide the lacking official statistics. 

In recent years we have witnessed the extension of sampling meth- 
ods to a great diversity of situations and for a variety of purposes, for 


example: 


example is the populati 
population characteristics w. 


+ the most efficient particular pattern and location 
an experiment in physics. 

g crop for studies in plant physiology, 
and others. 

f acreage devoted to а particular crop 
elds from the economically more 


(1) To find out the mo 
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agricultural meteorology, 
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important crops. 
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crime. 
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t the effects of propaganda. 
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(7) To determine the frequency distribution of the length of sen- 
tences or other factors to characterize the styles of various 
authors. 

(8) To investigate local government by examining the local laws 

in a few selected years over a 200-year period. 

To study methods of technical control in the manufacture of 

technical products. 

(10) To ascertain the location and frequency of individuals having 
special talents, such as persons able to withstand the rigors of 
dive-bombing, or individuals with certain types of color-blind- 

ness that make them valuable as observers who can detect 
camouflage. 


(9 


© 


Most of these investigations would probably be impossible from the 
standpoint of expense, time, and utility of findings if it were necessary 
to investigate the whole field of inquiry in any detail. Furthermore, some 
investigations require destructive tests; hence, there would be no point 
to the investigation if the destruction of the whole were essential. 

Sampling, of course, is an everyday affair. From time immemorial 
it has played an essential role in carrying out common human activities. 
Primitive man who sampled food before he gave it to his children relied 
on the statistical principle of sampling without knowing that he did so 
or that such principles exist. The modern housewife relies on the quality 
of the sample before she purchases in quantity. 

Probably because of the rapidly increasing use of sampling in experi- 
mentation and in survey studies, rapid development is taking place in 
the theory and design of sampling investigations. 

Sampling Designs. The planning of sampling designs is usually 
involved in two situations: extensive survey studies, descriptive or 
analytical; and experimental investigations, which are more restrictive. 
In both situations the sampling problem is that of securing accurate and 
representative samples. A representative sample is one in which the 
measurements made on its units are equivalent to those which would be 
obtained by measuring all the elements of the population, except for the 
inaccuracy due to the limited size of the sample. 

The principal questions which relate to the setting out of an investiga- 
tion by sample are 


(1) What is the best size of the sampling units? 

(2) What number of sampling units should be used to secure the 
desired degree of precision in the estimates to be made? 

(3) What system of sampling will secure the optimum allocation of 
the sampling units among the population or its subdivision? 


Population. To answer these questions, certain assumptions about 
the unknown population must be made. Tt is fundamental to use meth- 
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ods of sampling and of estimation that are based on a minimum of 
unavoidable assumptions and that also make unambiguous their exact 
implications. It may be stated in advance that there is not one faultless 
method of sampling. The method to be used is contingent upon the 
nature of the material available and obtainable for the particular problem 
under investigation. 

In practice, most populations are finite in character; the universe is 
comprised of a finite number of members. The conditions of an infinite 
universe, one which contains an infinite number of members, is assumed 
to be fulfilled in practice by sampling with replacement. А large part of 
statistical theory is also built on the assumption that the universe is 
continuous, that the members or some measurable variable make up a 
continuous set. 

A population is called existent if all members ean be enumerated or if 
the members ean be designated by a law of formation. For instance, 
the inhabitants of the United States and the universe of positive integers 
are existent universes. In cards and dice games and roulette, potential 
universes consist of the millions of combinations of 52 cards, of the 
millions of throws of a six-sided die, and the millions of turns of a roulette 
wheel with its 37 numbers. These need only be imagined as hypothetical 
universes. Likewise, a population of experiments is a hypothetical 


universe. 
The usual prac 


being sampled as the population, the 1 cho 
of a population or universe is a necessary first step in an investigation 


based on samples. The definition of the population to be covered in the 
investigation is an integral part of the statement of the purpose of the 


study. ca 
Randomness. Тһе concept of randc 
theory and practice, but it is rarely if ever defined, except perhaps in 
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inference is strictly valid only for random samples. It is also a matter of 
great practical and scientific importance to determine whether the fluctu- 
ations manifested by a series of observations are random in character or 
whether they may be assumed to be the outcome of some factor operating 
under a definite law. 

Testing for randomness is an important problem in quality control 
of manufactured products and also of special importance in the analysis 
of time series. The need for such tests has resulted in considerable 
research for criteria of randomness. 

Bias. If a sample has been chosen from a population in such a way 
as not to be a random sample, then no valid estimate can be made from 
it of a population parameter.! If a sample has been selected by a random 
method, it gives a result that progressively approaches the population 
value as the sample is increased in size, assuming that an unbiased method 
of estimation has been used. If the results obtained are too high or too 
low, then the sample is called biased. The difference between the value 
determined by a very large sample and the parameter or population 
value is termed an error of bias. 

Errors of bias follow no known laws by which their amount might be 
estimated. Errors of bias are incorporated, therefore, with random 
errors and may thus result in spurious estimates of the latter. In sam- 
pling designs every caution is necessary to avoid errors of bias. Even 
if an efficient method of sampling has been used, errors of bias may 
arise in a number of ways. For instance, biases have been observed 
in sampling surveys of households where nobody was found at home 
when the interviewer called for the first time. The smaller the 
family, the smaller are the odds that some one will be at home. Unless 
the visits are continued until complete enumeration is obtained, errors of 
bias will arise in connection with size of families and other characteristics 
associated with it. Other instances of bias in sample surveys may be 
traced to factors such as bias and irregularity in the interviewer, imperfec- 
tions in the design of the questionnaire, and errors arising from non- 
response on the part of the interviewee. 

A classical example of bias arising from an unrepresentative selection 
of respondents and from the erroneous belief that a large sample could 
overcome such an error is furnished by the attempt of The Literary 
Digest in 1936 to predict the results of the Presidential election. Approxi- 
mately ten million post cards were mailed to people whose names were 
listed in telephone directories and in files of owners of automobiles. Of 
the 2,350,176 replies received, only 40.4 per cent were in favor of Franklin 
D. Roosevelt for President. In the election, he received 60.7 per cent 
of the votes cast. The error of bias was, therefore, approximately 20 per 


1 In systematic sampling, for instance in stratified sampling, the number of ele- 
ments to be selected from any stratum must be selected at random, 
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cent. The sample was biased in that the respondents did not constitute 
а random sample of those citizens who voted in this election. 
Questionnaire studies in which the sample selects itself, voluntary 
replies to requests for opinions on some controversial issue, and letters 
written to editors of newspapers—all are likely to represent mainly 
persons who have strong views on the issues one way or another. 


SYSTEMS OF SAMPLING 


The origin of the sampling problem is in the necessity. of estimating 
certain characteristics of a population usually so large that, it is practically 
impossible to examine every member of the population, or so large that 
the time and cost required to do so would prohibit the undertaking. 
In this undertaking, it is essential to consider how best to take the sample 
and to obtain the estimates, and with what precision the estimates have 
been made. The fundamental statistical problem is, therefore, that of 
estimation. 

Unrestricted Random Sampling. A particularly simple form of 
sampling technique is illustrated by the classical urn problem. By 
counting the number of balls of each color in the sample drawn from the 
urn, the relative proportion of balls of different colors in the sample is 
determined. From these proportions the color composition of the balls 
in the urn is inferred. By using the properties of the familiar binomial 
or multinomial distributions, the margin of error of the estimate can also 


be calculated. 

Ап analogous situation in principle might be the estimation of the 
occupational classification of the from 16 to 17 millions of men, twenty- 
one to thirty-six years of age who in 1940 registered in accordance with 
the Selective Service Act. Let us assume that each individual had a 

tten on a paper and enclosed in a 


registration number which was wri р І Lan 
separate capsule and that all capsules were placed in a container utilizing 
compressed air to secure à constant rotation. One thousand capsules 


would be drawn at random and the corresponding occupations ascer- 
tained. In order that statistical principles might be used in a valid way, 
it is fundamental that each member of the sample should be chosen 
strictly at random, which means a method of selection by which each 
member of the population has an equal and independent chance of being 
included in the sample, and that the method of selection is completely 
independent of the characteristics to be e pnis a is p dede 

ing, sometimes called Unrest ricted or the unitary 
lar ج‎ ye This method is regarded as being capable 
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Systematic Sampling Methods. In contrast to the method of simple 
random sampling, a number of methods have been developed which 
may be called systematic methods. These methods utilize prior knowledge 
of the individuals comprising a universe with the view to increasing 
accuracy and representation of samples. They generally use more 
complex forms of random sampling called representative sampling. 

Stratification. One of these systematic methods is based on the use 
of knowledge of population characteristics, first to divide the population 
into more homogeneous groups or strata and then to select at random 
the sampling units from each of these groups. This method has been 
called restrictive random sampling or the method of stratification. It is in 
effect а weighted combination of random subsamples. Various prin- 
ciples have been used to distribute the sampling units among the several 
strata. One, called stratified proportionate sampling, is based on the 
distribution of sampling units purely proportional to the total number of 
units in each stratum. In simple random sampling this proportion is 
left to chance. Another basis is to take the number of sampling units 
per stratum proportional to the product of the number of sampling 
units in the stratum by their standard deviations. 

Stratified sampling is used in the Gallup polls of publie opinion in 
order to secure representative proportions of various classes of people 
rather than to rely on the chance determination of these proportions. 
In the interviews that are made, each subject supplies sufficient informa- 
tion about himself to permit classification according to (1) part of the 
country, (2) the urban or rural distriet, (3) socioeconomic status, (4) 
political affiliation, (5) age, (6) sex. The particular type of stratification 
used depends on the problem under inquiry. 

While some progress has been made, the methods in use for predicting 
elections are not yet scientific. Among other hazards, the sample design 
may reflect erroneous judgments as to the factors (used for controls in 
stratification) truly associated with the characteristic under investigation. 
Serious biases may also be introduced because the selection of the sam- 
pling units within a stratum to be interviewed is not done at random, mak- 
ing it impossible to obtain an unbiased measure of sampling error from the 
internal evidence of the responses themselves. Furthermore, the 
population composed of eligible citizens who subsequently go to the polls 
and vote is difficult, if not quite impossible, to specify in advance of 
sampling and the trait itself is susceptible to change without notice. 

Cluster Sampling. 'The method of Stratified sampling is also used 
where the unit of sampling is a group rather than the individual. This 
method, sometimes called cluster sampling, is especially important in the 
study of human populations when the individuals are often grouped (as 
by families, inhabitants of single houses or apartment houses or of 
blocks, and so on) as in the census, for instance, and it becomes very 
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difficult to sample individuals at random under such circumstances. 
Most uses of this method apply a system of “exclusive units," where по 
individual or group is included in more than one sampling unit. Mahal- 
anobis (Ref. 19) has used a variant of the method called the “zonal 
configurational” type or the “overlapping system of grid sampling,” in 
which the same individual or group may form a part of more than one 
sampling unit. He points out that this method is analogous to sampling 
from an urn with replacement. 

Purposive Selection. A method of systematic sampling essentially 
different in principle is that which is called purposive selection. Instead 
of making a random selection of the sampling units within strata, this 
method selects such groups of units that have the weighted sample 
means of certain characteristics, the controls, in close agreement with the 
population values. This method might save time and labor at times. 
However, i& has often proved to be very hazardous and inaccurate, 
probably because the sampling units are large and few in number, so 
that it is difficult to secure a representative sample. Furthermore, the 
method hypothesizes a considerable knowledge of the population in 
advance of the sampling process. This information is not often avail- 
able, and it has been found in a particular case that the facts about the 
population needed for controls served only for the particular year when 
the sampling survey was made (Ref. 23). : | 

Applications of the purposive method have been made in certain 
economic surveys by selecting so-called “typical” counties. ‘The prac- 
tice of selecting a particular school or groups of schools in which experi- 
ments are conducted may also be illustrations of this method, espe- 
cially if general conclusions are drawn for the educational factors under 


investigation. 
Double Sampling. 

cially for sampling hum 

(Ref. 21). This method in 
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proceed with the drawing of the small sample out of the strata comprising 
the large sample. Accordingly, a more accurate estimate of the primary 
character may be expected to be obtained from the stratification based 
on the first investigation. The first sample must be large enough to 
provide an accurate estimate of the population numbers if increased 
accuracy of estimation is to result through the double sampling method. 

A variant of this method is to find the regression of the primary on the 
secondary character from the data in the small sample. The predicted 
value in the regression equation which corresponds to the mean value of 
the second factor in the large sample is then used to estimate the mean 
value of the primary character for the total population (Ref. 1). 

Subsampling. Cochran (Ref. 1) describes a method called subsamp- 
ling, in which a sampling unit may itself be enumerated by subsampling. 
There might be a hierarchy of sampling units in multistage sampling; for 
example, sampling units might be selected in the first stage of random- 
ization, within each such selected unit. Smaller sampling units then 
might be selected by another act of randomization, and so forth. This 
special form of subsampling has been called “nested” sampling by 
Mahalanobis (Ref. 19). 


THE SELECTION оғ THE SAMPLING SYSTEM 

No simple principle exists which leads the investigator uniquely to 
the selection of a system of sampling. From the many sampling designs 
that can be constructed in order to answer the questions which prompted 
the research, one will be selected for application on the basis of the nature 
of the problem, the resources and the materials available or obtainable, 
and certain statistical and administrative considerations. 

From a statistical standpoint, the problem is to secure the best esti- 
mate of the population characters chosen for study. On the basis of 
knowledge of limiting distribution theory and of best linear unbiased 
estimates, it is the usual practice to take the standard deviation of the 
sample estimate about the character estimated as the measure of sampling 
error. The relative efficiency of different methods of estimation is 
obtained from the ratios of the reciprocals of the variances of sample 
estimates of themean. The statistical criterion of efficiency is usually not 
the only basis of deciding upon the sampling plan. Another principal con- 
sideration is the cost of the investigation. 

The basis of planning, therefore, is the selection of a sample design 
which combines precision of the results and expenditures in such a manner 
that either the cost is a minimum for any specified precision or the pre- 
cision is a maximum for any assigned cost. Considerable work has been 
done in recent years on the study of costs associated with the various 
sampling and estimating operations, including the determination of the 


relative magnitudes of variances and covariances between and within 
various kinds of sampling units. 
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Thus, although no complete theory with practical applicability is 
available whereby the investigator always could be certain of selecting 
the “best” sampling design and at the sume time the “best” process of 
estimation and allocation of sampling units, considerable empirical and 
scientific knowledge is available upon which an intelligent selection can 
be made. 'To a certain extent each field of study may have its own 
peculiar sampling problems. But the principles so far educed have wide 
and general application. Often an exploratory or pilot investigation 
may save a good deal of time and unnecessary expense by providing useful 
information of the cost and variance, or error functions. In addition, 
the exploratory period can be used advantageously in giving training to 
workers in both field and statistical work and thus in controlling mistakes 


and errors arising from the human factor. 


STATISTICAL ÁSPECTS OF SAMPLING DESIGNS 
The statistical planning of the program for obtaining observations 
Ives the problems of specification and estimation. A 
thematical form of the population is known or 
but the values of one or more parameters entering 
Estimates of one or more parameters are 


from samples invo! 
knowledge of the ma 
assumed to be known, 
into the form are unknown. 
desired, each with minimum sampling error. 

In most statistical investigations by sample, a central problem is to 


ascertain the value of an average (Ref. 5). 
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and : 
The value of an average 1$ 
lie» 
= А (9.03) 
У шь) 
E 


i ber of sampling units in the kth stratum; Хы, the 
Where: m 16 the шір, of the kth stratum; (т) may be 


value of the variate in the ith element 5 
known or unknown, finite or infinite. The major sampling processes 


such as random sampling with or without replacement, stratified random 
sampling of individual elements, stratified random sampling of groups 


194 SAMPLING THEORY AND PRACTICE [Снлр. ІХ 


or clusters, double sampling, and purposive sampling can be illustrated 
and differentiated by the different grouping methods for each of which the 


sum of z 5 (Хы) in Equation (9.03) is obtainable. 
Е i 


Insight as to the arrangement of strata and the average to compute 
has grown out of the study of the problem of estimation in stratified 
sampling of groups. In stratified sampling, (9.03) becomes 
ӯ = Xa) 

(т) 
where X, equals the average value of X in the kth stratum. In some 
problems, it has been found, that, by choosing the strata so that the 
regression of X, on some appropriately selected variate Y is linear, an 
improved estimate of X can be made (see Double Sampling, above). 

In general, there is no unique unbiased estimate of a parameter. 
Under particular conditions the best estimate can be found if the quantity 
is a linear function of the observations as in (9.02) above. А method and 


(9.04) 


character within the stratum. The “best” estimate is defined by the two 
conditions that (1) it should be a linear unbiased estimate with (2) 


A fundamental condition in the best solution is that the total number 
of sampling units must be kept constant. In Neyman’s method the best 
solution depends on a knowledge of the population standard deviation 
of each stratum, Sukhatme (Ref. 26) investigated the effect of estimat- 
ing the standard deviation of the different strata by a preliminary inquiry. 
He concluded that a gain in efficiency takes place even in the case where 
the population standard deviations, as, are estimated from the sample 


standard deviations, the 5/8, that is, when the сгѕ are different in differ- 
ent strata. 


Mahalonabis (Ref. 19 
involved in large-scale sa 


TYPES or Error IN INVESTIGATION BY SAMPLE 
Statistical data are the raw material of judgments 


i » Comparisons, and 
truth. The highly condensed form to which the origi: 


nal data are usually 
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reduced by processes of statistical reduction gives to the final results a 
display of exactness that is not necessarily intrinsic. In viewing the 
final product, one should not forget the original material from which it 
came. In order to evaluate the findings from an investigation, much 
information is necessary as to the ways in which the original data were 
collected, the conditions surrounding them, and the kinds of errors to 
which they are susceptible. We wish to consider here the types of 
errors which are present in every study by sample. 

Random Sampling Errors. First, there are the random sampling 
errors or sampling fluctuations dealt with in the theory of probability 
and in the theory of sampling distributions. They are the outcome of the 
random sampling process, and sampling theory enables us to estimate 
them when we know their form of distribution. Random sampling 
errors have the advantageous property that they can be controlled by 
regulating the design and size of the sample. We have considerable 
theoretical and experimental knowledge of this type of error. Often, 
however, particularly in sampling survey studies, this is the smallest 
error in the collected data. 

Systematic Errors. Apart from sampling fluctuations, errors also 
originate from the unreliability of human observers, either in direct 
observation or in other forms of measurement. Errors of measurement 
are usually much greater in biological, psychological, economic, or social 
investigations than in the ‘physical sciences. Insofar as observational 
errors originate unconsciously, they may more or less follow the normal 
distribution of errors so that positive and negative deviations would tend 
to cancel increasingly as the number of observations increase. It is a 
mistake, however, to rely upon these errors’ canceling one another. They 
may often possess not only à random element but also a bias. A special 
study needs to be carried out: either by repeating the observations or 
measurements by the same observer or by more than one one or by 
some other type of control and to compare the results. ш Te some 
observations, we are at times prone to dismiss as unessential conditions 
about which we think we know more or less. At times there may be 


i i i racti vever, to test the 
justi i r ttitude. It is good practice, however, to / 
amie ا‎ я cause by arranging the observations 


ibili e circumstance as & с ion 
io eg da hs circumstance. If the assumed CRORE is =~ it E 
found that the errors of the observations display & regularity no à oun 
in chance errors. Wrong assumptions concerning the те o ie 
circumstance may bring about similar findings in а es cpm 42 т 
with the results of observations. Errors of this type are сабе sys e 
errors. 

Miscellaneous Inaccure 
vational errors and sampling erro 


Ways, The worst of these origin 


Contrasting sharply with random obser- 
inaccuracies may arise in a number of 
actices as false entries 


ıracies. 
rs, 
ate from such pr 


196 SAMPLING THEORY AND PRACTICE ІСнағ. IX 


In making a house-to-house survey, it is obvious that much depends 
on the resourcefulness, skill, and reliability of the investigators. The 


poorly defined, or on matters of opinion, depends considerably on the 
form of questions asked. Sometimes the result of the inquiry is condi- 
tioned by the investigation itself, as, for instance, when the person inter- 
viewed may not have heard or thought of the subject before, 

Much use of the questionnaire is made in collecting information from 
re often unwilling or 
People vary greatly in the 


ons, trained and experienced 
workers are necessary if the information collected is to be relied upon. 


plete directions as to how the 


enumeration may render difficult 
in different decades, Uniformity 


In fact, at times 
Е practice, so as to ensure continuity 
At any rate, if changes need to be 
f two sets of data, at least for some 
er under the new, so that continuity 


This need for uniform conditions might be illustrated if an attempt 
were made to interpret the differences between the health status of men 
eligible for Army service in 1917 and in 1941. Such difficulties as the 
following would be likel c i i i 


ce 1917 have made pos- 


physical disabilities, шз for identitying 


The valid interpretation of final statistical res 
of the conditions surrounding the eve 
of observation. For instance, there 


ults requires a knowledge 
nts recorded at the place and time 
are many limitations on the use of 
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physieal-examination findings of selectees in World War II for drawing 
inferences concerning the general health status or ihe incidence of minor 
defects among the population (Ref. 14). The examinees at any induction 
station comprised a partly selected and widely variable sample of the 
male population at a specific time and place. The composition of the 
selectees chosen for examination was conditioned by (1) prevailing Selec- 
tive Service policies with respect to deferments for dependency, (2) 
practices of the Armed Forces in regard to the acceptance of special 
groups, (3) the extent of differential screening of local boards, and (4) 
the number of men previously rejected who were sent up for re-examina- 
tion. The comparison, for example, of those individuals who were 
rejected during the prewar period of Selective Service with those rejected 
at various periods during the war would require careful interpretation. 
The high rejection rates of the former do not necessarily imply a low 
level of national health. 
Differing Types of Canvass. 
cussed 13 different factors that 


Deming (Ref. 6) enumerated and dis- 
affect the usefulness of survey studies. 
This comprehensive and informative discussion includes additional 
types of errors or additional properties of errors not hitherto discussed. 


Only brief consideration can be given to these. Information is needed 
with respect to differences in results obtained from different kinds and 
degrees of canvass, such as, mail, telephone, telegraphs, and interviews; 
also from different types of questionnaires. Different results are obtained 


by the different sponsoring agencies under whose auspices the survey 
study is carried out. For example, studies on 1160106 and Wark: shatus 
yield different results when conducted by relief спевао than when 
conducted by a government agency. Because of this bias, government 
and private organizations have at times contracted with other agencies 
for the collection of data. Cohen (Ref. 2) reports an instance where in 
China one census, taken for poll-tax and military purposes, fi a 
population of 28,000,000. Another census over the same a taken 
this time for famine relief, returned а population e 105 ОП 1 tion i 

Changes in Population. There may be changes in = population in 
the interval between the time of collection of data and t E юш ШЕ 
A sample may be more reliable than complete ^ ecause У i 
shorter period required for collecting gad PE ver pln this 
ing the data must commence at 8 certain date, rep: * — A sample 
deadline are not included. The late г apanê Шу г eut id binh 
study of these belated reports may at times m whe ето 
present. Тһе сотрагіѕоп of two or more sampi о тү жы. i 4 
design or of subsamples within the main sample = = 2 * 53 od 
atic error” inherent in the methods. If two ge ih ores qum 
cate not that they are devoid of bias but thai e 


similar. 
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Unrepresentative Date. Bias can occur from an unrepresentative 
choice of a date for a survey or a period to be covered. For instance, a 
passenger-traffie survey would not be representative if taken on or near 
a holiday date, nor would a school survey taken, say, the first week in 
June. Comparison of retail sales made in April, 1938, with those in 
April, 1937, gave spurious results, since the Easter holiday in 1937 came 
at the end of March, whereas in 1938 Easter occurred in the middle of 
April (Ref. 2). 

In Processing. Processing errors may result from differences among 
workers in interpreting the wording of instructions, in editing, and in 
field work. Machine and tally errors need to be checked. 


PLANNING THE INVESTIGATION 


erated here, such as 

errors of response, late reports, errors 
originating in the tabulation plans, bias from unrepresentative dates or 
periods, changes taking place in the population before tabulations become 
available, and errors in interpretation. Furthermore, even if there is 
100 per cent Coverage, this is still a sample since at any other given time 


preliminary consideration of all the errors 


ermines whether or not 


Once a decision to proceed has 
In error will be de 


hat the more si 
Dias, 


of the classical theory of errors. 


be cared for largely by knowledge of and control of 
atic errors in the data may be 
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Except for the random factors that might balance out, further increase 
in size of the sample does not increase the accuracy by eliminating 
systematic errors. Nor would they disappear if complete enumeration 
was resorted to. 

It is an essential part of the sampling design to provide statistical 
controls for detecting and guarding against systematic types of errors. 
One way of doing this, for instance, in a sample survey is to collect two 
or more interpenetrating subsamples, which may be independent or 
partially linked together (Ref. 19). Such a simple control may not 
always suffice. It may be advisable, therefore, to arrange for the survey 
of the same sample, wholly or in part by two or more different workers. 
Just which sources of error are to receive the most attention will depend 
upon their importance in relation to the accuracy with which the study 
must be carried out in order to produce useful results with the funds 
available. This is the matter of the particular problem. Knowledge 
of the actual conditions and the types of systematic variation likely to 


arise in them, and how they may be eliminated or reduced when necessary, 


is basic. 
The margin of error of the final estimate that can be tolerated if 


the conclusions drawn are to merit confidence must be considered in 
light of all kinds of errors to which the data are susceptible. The lack 
liability in the data cannot, of course, be overcome by 
is that is applied. Thus, the task is first 
tly precise for the purpose in hand and 
hat make the best possible use of the 


of accuracy and т 
the subsequent statistical analys 
to secure data that are sufficien 
then to apply methods of analysis t! 


information they contain. 


PROCEDURES IN RANDOM SAMPLING 
and in representative sampling, a fundamental assumption 
dom. Upon the fulfillment of this assumption 
plication of most of the statistical analysis. 
The objective measurement of errors of estimation and the determination 
of the significance of the sample results are dependent on the hypothesis 
of the randomness of the sampling errors. 16 is, therefore, of interest 
and importance to note what solution, if any, the statistician formulates 


so that he can proceed with confidence in his analysis. 
The information as to whether a sample is random is not available 
through examination of the properties of the sample itself. This short- 


coming is illustrated by some of the hands which are obtained from deal- 
ing at random from а pack of cards, for instance, a hand containing 
13 diamonds. The criterion, therefore, of a random sample has to be 
sought elsewhere, namely, in the process or method of selection. If à 
random method of selection can be developed, then a random sample can 
be simply defined as a sample which has been obtained by a random 


method. 


In random 
is that the sample is ran 
rests the validity of the ap 
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one of a set of such aggregates. Неге it is assum 
objects means that the set was obt. 


recalled that random sampling at the outset is designed to give every 


ce of being the actual sample. 
at it should be independent 
r investigation. Since this 


m for one Population and not for another. In 
T one characteristic of a 


illustrates this Point by citing the 


х A . Я , Study was to determine the 
Proportion of inhabitants w 1t probably woul d lose # 1f 
mating the distribution of 


This might appear to be an impos- 
Sible task, since, as has been pointed is in the very nature of hs 
at least one of its char- 
oved by superimposing a 
and Sampling in accordance with it. 


then the problem of random sampli 
discovering a series of random numb 

The customary way is to number the universe in any practical man- 
ner, whether or not related to its Properties, and then to look for a set of 
numbers so that they constitute а random aggregate from the possible 
ordinal numbers of the universe, Thus 


determining in each case whether a sampli 


ers, 
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characteristics of the population, it became necessary only to construct 
a set of digits capable of giving à random sample of any size from any 
finite set of integers. Under such conditions it may be expected that the 
arrangement of digits in the sampling numbers will not be associated 
with the characteristies of the universe. Such was the principle upon 
which sets of random sampling numbers" have been compiled. 


Kendall (Ref. 11) specifies certain requirements, other than that of 


having been chosen at random, that a set of random sampling numbers 


must satisfy if it can be used for random sampling. Each digit in a set 
of N random sampling numbers is expected to occur in N/10 cases and 
each pair of digits to occur an equal number of times. He speaks of a 
set with such properties 3$ locally random and gives four necessary 
tests, although they are not sufficient, to determine the existence of local 
randomness: 
(1) The frequency test. Each digit should occur an approximately 
equal number of times. a 
(2) The serial test. There should be no tendency for a digit to be 


followed by any other digit. | А | 
(3) The poker test. There will be certain expectations to be satisfied 
for digits to be arranged in blocks of, say, five, four, three, and 
The g There are certain expectations to be satisfied with 


4) Th test. BUSS 
ш mild ihe gaps occurring between the same digits in the 


series. 


o sets of random sampling numbers in common use, 
and Fisher and Yates's (Ref. Т). A third веб has been 
published by Kendall and Smith (Ref. 12). Tippett compiled his set 
igits at random from census reports and by combin- 

r-figured numbers. They have been sub- 
hich they have met the criteria of 
Fisher and Yates's set of random numbers was 
he fifteenth and nineteenth digits of A. J. Thompson's 
table. The authors present tests of its randomness. 
Each of the compilations is accompanied by a number of illustrations 
of its use. If, for instance, а random sample is wanted from a list or 
roster of names, the procedure would be as follows: First each sampling 
unit is numbered in any way, systematic or otherwise. The tables are 
then opened at random and starting at any point and proceeding in any 
direction, such as up OF down the columns, along the rows, or by some 
other predetermined plan, a sufficient number of pairs of digits or other 
he predetermined size of the sample. 


combinations are taken to make up the ] 3 
Whenever the same number occurs twice or more it is simply ignored. 


There are tw 


randomness used. 
constructed from t 
20-figure logarithmic 
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All numbers which exceed the total number of sampling units are also 
ignored. | 

Other methods of drawing random samples are used, such as using 
coins, dice, roulette wheels, or cards. Great care must be taken, how- 
ever, to avoid bias in using such mechanical means. The human being 
has been shown to be especially incompetent to make a random selection. 
The problem of selecting a random sample has been greatly simplified 
by the preparation of tables of random sampling numbers. When the 
rules of the game are scrupulously observed, their use likely gives the 
best guarantee now available of obtaining a random sample. 


A COMPARATIVE EXPERIMENT IN SAMPLING METHODS 
In order to illustrate some of the principles underl 
procedures that have been discussed in this chapter, 
carried out. Its findings are presented herewith. 
We have a finite population consisting of 24,395 high school graduates 
whose ages were given as of the nearest birthday at the time of graduation. 
They have been classified according to sex and location of high school, as 
given in Table 45. The means and standard deviations in years for the 


total population and for each of the four subclasses are also recorded in 
Table 45. 


ying sampling 
an experiment was 


TABLE 45 


Aces or 1933-1944 Нісн-Ѕсноо, GRADUATES IN Pusuic 8сноогз оғ 


MINNESOTA 
CLASSIFIED ACCORDING TO Sex AND SIZE or Locarrry* 
tside 3 cities of first n 
Stata Outside das” Оа 3 cities of first class 
Age asa 
whole 
Boys Girls Boys Girls 
15 84 26 43 6 9 
16 1,585 457 812 115 201 
17 8,729 2,486 3,870 930 1,443 
18 12,148 3,269 4,726 1,667 2,486 
19 1,562 352 637 239 334 
20 216 56 73 46 41 
21 71 22 19 22 8 
Total 24,395 6,668 10,180 3,025 4,522 
Mean 17.59 17.56 17.53 17.74 17.68 
S.D. .7799 -7763 .7903 -7848 7352 


We shall assume that we w 
graduates by taking a sample 
24,395. We shall use three differ 
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assuming that each age group is (1) evenly distributed among the sub- 
classes, (2) stratified proportionately to the sizes of subclasses, and (3) 
stratified proportionately to the products of the sizes and standard 
deviations of the subclasses. 

First we shall describe the method of drawing the sample of 1,000 
graduates from this population as a whole. 

The first step was to assign a five-place number to each element of 
the population (see Table 46). i 


TABLE 46 
ASSIGNMENT OF RANDOM SAMPLING NUMBERS 
TO THE 24,395 GRADUATES OF TABLE 45 

Age Numbers 

15  00,001-00,084 

16 00,085-01,669 

17  01,670-10,398 

18  10,399-22,546 

19  22,547-24,108 

20  24,109-24,324 

21 21,325-24,395 


The second step was to read Fisher and Yates's Table of Random 
Sampling Numbers (Ref. 7), page by page, first horizontally and then 
vertically. Each time five consecutive figures were read; they consti- 
tuted a five-place number which was then referred to Table 46 to give 
the element an age score. Whenever a number larger than 24,395 was 
obtained, it was discarded. In this way, we formed a sample of 1,000 


as indicated in Table 47. 
TABLE 47 
A SAMPLE or 1000 Drawn BY THE METHOD 
or RANDOM SAMPLING NUMBERS 


State 

Age asa 
А whole 
15 4 
16 63 
17 378 
18 486 
19 53 
20 11 
21 5 
Тойа!.......... 1000 


Тһе final step was to stratify this sample of 1,000 according to the 


three methods enumerated above. 
The first method, that of strat: 


simple. We simply split each age 


in Table 48. wa . 
In using the second method, that of stratification proportionate 
he population in each of the four subclasses, we 


to the total number int 1 
needed first to compute the proportions of the four subclasses. Let us 


ification with no restriction, was very 
group into four subgroups as reported 
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TABLE 48 
STRATIFICATION OF THE SAMPLE Or 1000 GRADUATES WITH No RESTRICTIONS 


Outside 3 cities 3 cities 
of first class of first class 
Age 
Boys Girls Boys Girls 
15 1.00 1.00 1.00 1.00 
16 15.75 15.75 15.75 15.75 
17 94.50 . 94.50 94.50 94.50 
18 121.50 121.50 121.50 121.50 
19 13.25 13.25 13.25 13.25 
20 2.75 2.75 2.75 2.75 
21 1.25 1.25 1.25 1.25 


denote by N, and №, the numbers of boys and 
the three cities; by Ng and N 4, the numbers of 
inside the three cities. "Then we calculate: 


girls respectively, outside 
boys and girls respectively, 


Ni:Ne:NaiNs = 6,668:10,180:3,025:4,592 
= £,668 10,180 3,025 4,599 


24,395 ` 24,395 ` 24,395 ` 24,395 
= -2733 :.4173 :.1240:.1854 


Each age group was then split accordin 


5 to this ratio. The resultant 
stratifieation is reported in Table 49, 


TABLE 49 
STRATIFICATION OF THE SAMPLE OF 1000 GRADUATES ACCORDING TO PROPORTIONATE 
NUMBERS IN THE Poput. 


ATION STRATA 


v 
Outside 3 cities ч 
of first class 3 cities of first class 
Age 
Boys Girls Boys Girls 
15 1.09 1.67 0.50 
16 17.22 26.29 ы 4% 
17 108.31 157.74 46.87 70.08 
18 132.82 202.81 60.26 90.10 
19 14.48 22.12 6.57 9.83 
20 3.01 4.59 1.36 2.04 
21 1.37 2.09 0.62 0.98 
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Let us assume that Ni, Ns, Ns, and N, have the same notation as 
used before. Denote by e: and es the standard deviations of the ages of 
boys and girls respectively, outside the three cities; by оз and оа, the 
respective standard deviations inside the three cities. Then we calculate: 


N01: Моз: Мҙөҙ: аба 

6,668(.7763) : 10,180(.7903) :8,025(.7848) :4,522(.7352) 
5,176:8,045:2,374:3,325 , 

5,76 8,045, 2,374 3,325 

18,920 18,920 18,920: 18,920 

= 2736:.4252:.1255:.1757 


Each age group was then split according to this ratio. The resulting 
stratification is reported in Table 50. Ы 


TABLE 50 
1E SAMPLE ОЕ 1000 GRADUATES PROPORTIONATE TO THE PRODUCT 
wp STANDARD DEVIATIONS IN THE POPULATION STRATA 


\ 


STRATIFICATION OF TH 
or THE MEANS A 


iR 3 cities of first class 
Age 
Boys Girls Boys Girls 

15 1.09 1.70 0.50 0.70 
16 17.24 26.79 7.91 11.07 
17 103.42 160.73 47.44 66.41 
18 132.97 206.65 60.99 85.39 
19 14.50 22.54 6.65 9.31 
20 3.01 4.68 1.38 

21 1.37 2.13 0.63 


We then tested the goodness of fit for the three kinds of stratification 
by using the x2-criterion. Before doing this we needed to compute the 
theoretical expectations of frequencies for each age group in each subclass 
if we drew a sample of 1000 exactly representative of the parent population. 
The calculations of the theoretical expectations are reported in Table 51. 

The test of the goodness of fit of the method of randomization without 
restrictions gave a value of xi = 262.2836. Referring to the x? table 
with 18 degrees of freedom, we find that P < .001. "Therefore, we con- 
clude that this kind of stratification is not a good fit to the theoretical 


expectations. 


The test of goodness of fit of the distribution of observed values from 


the method of stratification according to proportionate numbers and the 
theoretical distribution gave à value of xj — 20.1521. Referring to the 
ees of freedom, we find that the corresponding 


x? table with 18 degr 1 
value of P > 30. Therefore, we conclude that the stratification pro- 
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TABLE 51 " 
JLATION OF THE THEORETICAL EXPECTATIONS OF FREQUENCIES FOR Eacu AGE 
1 Екен SUBCLASS ror А REPRESENTATIVE SAMPLE or 1000 GRADUATES 
ROUP 


F Per Cent 7 

Age | (Frequency of F (Theoretical fre- 

population) 24,395 quency: % 1000) 
15 26 .00107 1.07 
16 457 .01873 18.73 
17 2,486 .10191 101.91 
Boys (| 18 3,269 .13400 134.00 
19 352 .01443 14.43 
20 56 .00230 2.30 
Outside 21 22 .00090 0.90 

'Three 

Cities 15 43 .00176 1.76 
16 812 .03329 33.29 
17 3,870 .15864 158.64 
Girls (| 18 4,726 .19373 193.73 
19 637 .02611 26.11 
20 73 .00299 2.99 
21 19 .00078 0.78 
15 6 .00025 0.25 
16 115 .00471 4.71 
17 930 .03812 38.12 
Boys 18 1,667 .06833 68.33 
19 239 .00980 9.80 
20 46 .00188 1.88 
hres 21 22 .00090 0.90 
Cities 15 9 .00087 0.37 
16 201 .00824 8.24 
17 1,443 .05915 59.15 
Girls (| 18 2,486 .10191 101.91 
19 334 .01369 13.69 
20 4l :00168 1.68 
21 8 .00033 0.33 
Total 24,395 1.00000 1000.00 


portionate to subclass numbers in this case is a good fit to the theoretical 
expectations. 

From the test of the goodness of fit for th 
according to the product of the numbers and 
sub classes, we found a Xi = 18.1743. 


e method of stratification 
standard deviations in the 
Referring to the x? table with 
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ratio of Ni:N2:N3:N4. We do, however, note a reduction in x? in taking 
into account the subclass standard deviations. 


10. 


12. 


13. 


. Design à 


. Set up a plan for 


. Set up a sample of s 


. What methods 


. Suggest methods based on 5 


PROBLEMS 


. Work out a sampling design for securing data about the number of 


students enrolled in the several high-school subjects in your state. 


. Design a sampling survey for obtaining data concerning promotion 


policies for teachers in the elementary schools of your state. 


. Secure a representative sample of schools to engage in a cooperative 


experiment testing the relative efficacy of different curricular prac- 
tices in secondary schools. 

sample survey for securing the best estimate of student 
enrollment in institutions of higher education in the United States; 
this information to be made available within a month after the 
opening of the institutions in the fall. 

a survey by sample of the attitude of the public 


toward Federal support of education to equalize educational oppor- 


tunities. 

chools in your state which can be used recurrently 
for the collection of school statistics. Design the sample so that 
designated portions of the schools are taken out each year and new 
schools added so that no school carries an excessive burden. 

a method of sampling with the method of complete survey 


Compare 
oblem with respect to cost and time 


for a specified educational pr 
required to issue the results. 

What recent developments have taken place in the techniques of 
questionnaire construction, in procedures in carrying out the inter- 
view, and in bringing about maximum returns from prospective 
respondents? 
have been developed to control error in the processing 
of survey data? 

How can develo 
equipment be applied to lar 


pments taking place in electrical and electronic 
ge sample surveys? 
tatistical and research principles which 


could be used for improving and standardizing procedures for col- 


lecting school statistics in your state. 

Evaluate the sampling procedures used in Kinsey, Alfred C., Pom- 
eroy, Wardell B., and Martin, Clyde E., Sexual Behavior in the 
Human Male. Philadelphia: W. B. Saunders Company, 1948. 
Criticize the sampling methods used in the Revision of the Stanford- 
Binet Scale. See Marks, Eli S., “Sampling in the Revision of the 
Stanford-Binet Scale,” Psychological Bulletin, Vol. 44 (1947), pp. 


413-434. 
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14. Compare the relative efficiency of the. three different. sampling 
methods described in the text for estimating the ages of high-school 
graduates by calculating the mean and standard deviation for each 
method and comparing these estimates with the population values. 


Calculate the sampling errors for each method (See Note in Problem 
15). 


15. Specify methods of forming estimates and calculating sampling 
errors for each of the following sampling methods: 
(a) Random sampling (no restrictions) 
(b) Stratified sampling 
(c) Cluster sampling 
(d) Sub-sampling 
(e) Stratification for two or more factors 
(f) Balancing 


Note: This problem shou 


Id be postponed until the student has 
studied the techniques of an 


alysis of variance and covariance. 
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CHAPTER Х 
ANALYSIS OF VARIANCE AND COVARIANCE 


The analysis-of-variance technique developed by R. A. Fisher and 
first reported in 1923 (Ref. 7) constitutes amethod capable of analyzing the 
variation to which experimental and observational material is subject so 
that an assessment of the various components of variation can be made. 
Since its introduction, the analysis of variance has become more and 
more useful to large numbers of research workers in many fields. Fish- 
nly efficient one so far developed by which it is 
e the variation according to causes or groups of 
et the significance of a number of components 


possible to differentiat 
causes and to interpr 
simultaneously. 

The modern advances in exp 
become possible through the dev 
and of the analysis of variance. 
of the components of variation tra 


erimental and sampling designs have 
elopment of exact tests of significance 
Without these tools, the assessment 
ceable to the sources specified by the 


anged conveniently for the application 
of the necessary tests of significance, 

Assume that we have a measure of a 
cified by the letter X. This value of X 
ual to another or for repeated measure- 
In general, the variation is due toa large 
causes. Of these factors some may be 
may be called assignable causes 
ly numerous other causes which 
orance concerning them. These 


we gain in knowledge, more and 
more factors become assignable until th 


explained if we can identify all the fac 
The contribution of the known and un 
may be regarded, at least to a first, appr 
and may be represented symbolically thus: 


(10.01) 
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where a, b, c, . . . denote the respective contributions of the known 
factors A, B, C, . . . , and г represents the residual or the portion of X 
attributable to chance or unknown factors. If, for instance, the factors 
A, B, C, . . . can be maintained under complete control, their respective 
contributions a, b, c, . . . will continue to be constant, whereas the 
fluctuations from unit to unit in X will be entirely attributable to the 
variation in z. 

In experimental work various hypotheses may be advanced with 
respect to the effect of one or more factors, namely, А, B, C, . . . , and 
experimental designs are prepared to make the best determination of the 
presumed effects, a,b, ¢,.--- The measures obtained of the presumed 
effects need to be tested with a view to determining their significance. 
If the measured effects are real, that is, traceable to the origin specified 
by the particular experimental design, the experimental results would be 
characterized as heterogeneous in variation. If, however, the variation 
presumably contributed by the several independent contributions of the 
factors А, B, C, . . . would be only of the order of magnitude of the 
effect assigned to the random sources of variation, the conclusion would 
be that the presumed effects were not real but attributable to random 
causes. The variation in the experimental material would then be 
spoken of as homogeneous. That is, in order for variation to be strictly 
homogeneous, it is purely random—caused by a multiplicity of minor 
independent factors, incapable of resolution into more elemental form 
and indistinguishable one from another. . о. 

Hence, the fundamental problem in studies of variation is to be able 
to differentiate the variation and to trace each contributing factor or 
group of factors to its source. Although an analysis of this kind is of 
special significance in experimental work, there are many situations in 
research work where differentiation of sources of variation in observa- 
tional data is an essential part of the analysis. A general problem is 
that of determining whether two or more samples may be regarded as 


random samples from the same homogeneous population. | 
е. is of Variation. We shall illustrate the main 


An Application of Analyst i 
ideas of the above discussion by presenting an example. Let us take the 
data recorded in Table 52 which represent the mental ages in months of 

h randomly chosen from the same grade in 


6 samples of 6 pupils each, eac j 
6 different nt schools. Suppose that the data are required to answer 
the question: Is there evidence that the mental ages of the pupils are the 


same for the same grade in the 6 schools? . 

The variability in the mental ages of the pupils from the same school 
is so considerable that it would be hard to reach a conclusion on the point 
at issue from a mere inspection of the data in the table. Diagram 5 
brings out the situation more clearly; but even after examining it can we 
say that the differences in the means are significant? It is at this point 
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TABLE 52 | 
MENTAL AGES OF 6 RANDOM SAMPLES ОЕ 6 PUPILS EACH FROM 6 DIFFERENT SCHOOLS 
Mental ages in schools 
Individual 
1 2 Б] 4 5 6 
% 158 156 160 159 164 158 
2 157 155 158 155 168 148 
3 18 154 156 148 162 145 
4 151 153 155 147 160 144 
5 144 151 150 146 154 144 
6 143 149 145 145 151 136 
Mean 151 153 154 150 159 145 


Grand mean = 152 
that statistical theory can give assistance by determining how much 
consideration should be given to the apparent differences in means, which 
are hard to discern because of the residual fluctuation, z, due to chance 
causes. Specifically, the question is: What is the probability that the 
observed differences in the mean values of the 6 schools might have 
arisen simply through random sampling errors? 


М.А. 
165 


160 


155 
152 X. Grand Mean 
150 


145 


* Individual M.A. 
140 


O Sample Mean 


135 


| 2 3 4 5 6 
Sample Number 
Figure b. Тһе components of variation in the men: 
Statistical means enable us 
value. Since the form of the s 


tal ages of 36 pupils. 
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other and of the values of the assignable factors, if these exist; and they 
are normally distributed about zero with equal standard deviation. 

We shall briefly describe the model the statistician sets up for the 
description of the situation discussed here. If the X of Equation (10.01) 
represents the mental age of a single individual, then a on the right-hand 
side may be considered as a general average or mean of the individuals 
in all the samples, and b as à contribution—positive or negative—associ- 
ated with a particular sample. If there are changes in mental age from 
sample to sample which affect X, then the values of b, namely, bi, bs, 
. . . , bg for the 6 samples, will differ; if there are no such changes, then 


bı = = =b = 0 (10.02) 


The random or residual variations, 2, among the mental ages of indi- 
viduals from the same school obscure the real situation about the true 
value as estimated from the sample. Hence, it is not possible to take the 
difference between the observed sample mean and the grand mean as 
equal to b, ( =1, 2,5-2, 6). Therefore, it becomes necessary to 
answer the question: Taking into account the observed variation among 
mental ages of individuals in the same school sample, what is the prob- 
ability that the 6 obtained sample means would differ so much among 
themselves because of random sampling fluctuations if, in fact, Equation 


(10.02) were true? 
The method used by t 
below. н 
Let Xu be the mental age score of the ith individual in the ith sample; 
і-1,2,::-,бҙзібоі = 1, 2,7524 ,6. X,isthe mean of the observa- 


tions in the (th sample and X is the grand mean of the 36 observations. 
As illustrated for one individual from the third sample in Diagram 5, the 


mental age score of the ith individual in the tth sample may be considered 
as the sum of three components. Thus: 

Xv= X+ (%- X) + (Ku — X) (10.03) 
al-age score (164) of the first individual in the 


sample from the fifth school is equal to 152 + (159 — 152) + (164 — 159). 
Referring to Equation (10.01), X may be considered as an estimate of a; 
(X, — X) asan estimate of Dı; and (X; — X) of the residual variation 24. 
These are estimates because we have observations only from a random 
sample from each of the schools. " 

The significance of the difference X; ; ) 
acceptance of the hypothesis represented by_ Equation (10.02) is based 
on the magnitude of the components X, — X compared with X; = Xe 
A precise statistical test of the significance involves the use of the follow- 
ing identity: 


he statistician to solve this problem is outlined 


For example, the ment 


—X(-1,2,:-:,0)orthe 
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E) = ез р О = Жн 
(Xs = £y [X — X) -- (2, 


=) У, - 2+) уш, - Xy 
di S MCN (10.04) 
2-2 (Xs — X)(X, — X) 
-22%-3у а TY о, - хуз 
t i ез 
since the product term will vanish because у (Xs — XJ) = 0. 


1 

Before the magnitude of the two components can be compared, they 
must be divided by the quantities known as the number of degrees of 
freedom, which are r and N — 4, respectively, where r is the number of 
relations used to define the hypothesis, that is, 5in this problem. There 
are 6 independent values; therefore, 4 = 6. Tf the hypothesis tested is 
true, then 5 relations hold among the 6 parameters, namely, p, = 0, 
be = 0, b; = 0, bs = 0, b; = 0. Thus, r = 5; N > gq = 36 — 6 = 30. 

The criterion is 


F= SC Fy (10.05) 
ti 


@= 4 log, ——T . (10.06) 


D RE 

N —q 
Using the Tables of F or 2, respectively, we obtain the 5 per cent and 1 per 
cent levels of significance against which the obtained value of F or z is 
checked. 


The numerical solution for the example is carried out as follows: 


First, it is convenient to reduce the values in Table 52 by subtracting 
150 from each value obtaining the fo! 


llowing: 
Individual 1 2 3 4 5 6 Total 

1 8 6 10 9 14 3 
2 7 5 8 5 13 = 2 
3 3 4 6 -—2 12 — 5 
4 1 3 5 -8 10 — 6 
5 -6 1 0 -4 4 жей 
6 -? - -6 E 1 
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We then calculate the “within schools" sum of squares, that is, the 
sum of the squares of the deviations of the mental ages of the individuals 
in a school (sample) about their school means, as follows: 


2,2, Gt - £j = B+ 7432+ ду I1 


t 


- (pisse + or pani Coa 
6 


І 


1638 - 792 
846 


We next calculate the “between schools” sum of squares, that is, the 
sum of squares of deviations of the school means about the grand mean, 


as follows: 


2. 2 6? 18? 24? 0? 5: 2 = 2 2 
УУ H = + 18° + 2 + 54° + (—30)? (72) 
AL 36 
= 792 — 144 
= 648 
The total sum of squares, that is, the sum of squares of the deviations 
of the 36 individual mental ages from their grand mean, is obtained as 


follows: 
» Y a= ета 1-90 - B 
die = 1638 — 144 
= 1494 


s of squares with the appropriate number of degrees 


The respective sum: 
ed in the customary analysis-of-variance table, 


of freedom are record 
TABLE 53 


ANALYSIS or VARIANCE оғ THE MENTAL AGES or THE 36 PUPILS IN 6 DIFFERENT 
ScuooLs 


______ у M 


Sums of Mean " 
Source of variation d.f. Squares square F Hypothesis 
Between schools 5 648 129.6 4.6 Rejected 
Within schools 30 846 28.2 
Total 35 1494 


Table 53. The values under the column heading ‘‘mean square" are 
obtained by dividing the sum of squares in each row by the corresponding 
number of degrees. By applying Formula (10.05), we obtain as the 
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observed value of the criterion, Fo: 


129.6 
Ко = з 
= 46 


We then enter the F-table with m = 5, n» = 30, and find that 
Fo = 3.7. Since our obtained value, 4.6, is greater than 3.7, we may 
Conclude that there is a significant difference between the mean mental 
ages of the schools. We may also say that the null hypothesis under 
test, that is, the hypothesis stated in Equation (10.02), is rejected. 

Process of the Analysis of Variance. As has been observed in the 
example above, the actual process in the analysis of variance consists in 
breaking up the total sum of squares of deviations of the observations 
from the grand mean into independent portions assigned to certain 
factors. The structure of these component parts, usually determined 
by the design of experiment, is specified by the number of degrees of 
freedom or by the number of independent comparisons, which, like the 
corresponding sums of Squares, are additive in character. Therefore, 
the method is equally valid for small and large samples. 

Analysis of Covariance. Another useful extension of the general 
analysis-of-variance method is the analysis of covariance, also developed 
by Fisher. In this analysis, the process consists in breaking down the 
sum of products of deviations of any two variates from their means and 
assigning the respective components to Specified sources. One of the 
most useful applications of the covariance method is in sort 


ing out the 
particularly in experimentation. 


This operation 


› it has m 
observational material The effi 
samples may be regarded as h. 
population is elearly illustrat 
biometric method used for suc 
to caleulate independently a Standard error for each of the possible 
comparisons of the means of Several samples. The labor involved in 
this procedure is not its only objection. The chief objection is that in 
many cases the obtained estimates of standard errors may not differ 
beyond merely sampling errors. In such cases it may be concluded 
that the larger part of the observed differ i 

sampling errors, and that a more a 
cated analysis would result by pooli 
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from the different means and by applying the combined estimate in the 
test of significance. This change introduced by the analysis-of-variance 
method serves to provide an exact test of the null hypothesis and hence is 
used habitually by the modern research worker. Thus the method makes 
use of the relevant information contained in the data, since it takes into 
account the sampling distribution of statistics of the same kind. 

The foregoing discussion serves to give a general account of the main 
ideas underlying the analysis of variation. Accompanied by the illustra- 
tive example, this discussion should be suitable as an introduction for the 
reader to the application of the analysis of variance to the simpler prob- 
lems. Probably, however, the research worker will profit from a more 
complete and rigorous study of the statistical principles underlying such 
a powerful tool as the analysis of variance and covariance. It is fre- 
quently observed that the formulation of a problem in statistical terms, 
which requires an orderly arrangement of the known results and an aware- 
ness of the assumptions and how they may be tested, assists in making 
clear the essential features of a problem hitherto not clearly visualized. 

Before a number of practical applications of the method of analysis 
of variance and covariance are demonstrated, the next section will 
present the systematic formulation and solution of the problems under- 
lying these methods. This section may be omitted by the reader not 
interested in the mathematical developments. He can proceed directly 


to the practical problems in Chapter XI. 
MATHEMATICAL FOUNDATIONS OF ANALYSIS OF VARIANCE AND 
COVARIANCE 


Mathematical Ratification. 1. Suppose we have a normal distribu- 
tion with mean и and standard deviation c. It is well known that if 
we pick independently all the possible samples of size n from this popula- 


tion and denote the random effects for each sample by 
(1.01) а= Y,- at= 1, ЖШ) 
then the mean value of 2 will be normally distributed with mean 0 and 


standard deviation c/ a/n. So we may define, in this case, the maximum 
likelihood estimate of the variance, c?, of the population as 


(1.02) 


where o°; is the variance of 
The analysis-of-variance 


с? = по? 

sampling means of the random effects. 

method consists in the breaking up of the 
total variance into independent parts which can produce independently 
the maximum-likelihood estimates of c? due to the random effects alone. 
For instance, if we have p groups which are chosen by a certain criterion, 
then we immediately know in advance that these groups are more or less 
heterogeneous with respect to their corresponding means. However, we 
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pretend to assume that they are randomly chosen from the whole popula- 
tion in presenting the mathematical formulation as follows: 


(1.03) Zu = Yu — u — а, Ga ea Ds + 


where z,, is the random effect; Y, is an observation of the tth individual 
in the sth group; и is the population mean; and a, is the deviation from 
the population mean for the sth group. By the maximum-likelihood 
method we can easily get two independent estimates of g? from our 


sample: 
1 М 
(1.04) of = pn — 1) У 2 (Ya — Y)? 
8 t 
(1.05) of = TD (Y,— Y: 
Qi 0 XX. 
where Y,= "E .= 2 Dm 


By using Fisher’s z-test 
determine whether or not th 

Ordinarily, we are interested only in knowing if t 
Same means. So we often make the test on the basis of оё, which is called 


However, the result of significance of the 
variance оў, which is called the variance of "between," implies three 
alternative explanations, These groups have 


(1) Different means and different variabilities, 

(2) The same mean and different variabilities, 

(3) Different means and the same variability. 
Therefore, if we wish % 
test the hypothesis с, 
the L;-criterion.! 

The same mathemati 


9 rule out the first 


two explanations, we have to 
= c for these groups. 


This may be done by using 


cal approach can be applied to the problems of 

- Inthis case, we have independent estimates 

i dition to those due to the main factors.? 

Present assumptions which Should be 
е:3 

stribution should be normal. This assumption, 


however, is not especially important. Eden and Yates (Ref. 2) showed 


1 For the method of using the L,-criterion, see page 82, 

? For a detailed consideration of these interactions, see Refs. 13 and 14 of Chapter 
III. 

3 For assumptions underlying the analysis of varia; 


nalys ance; See Ref. 3; for a discussion 
of the consequences when any assumption is not Satisfied, see Ref. Д. 
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that even with a population departing considerably from normality, the 
effectiveness of the z-distribution still held. The normality and inde- 
pendence of the random elements in successive observations has been 
pointed out on page 212. 

(2) All groups of a certain criterion or of the combination of more 
than one criterion should be randomly chosen from the subpopulation 
having the same criterion or having the same combination of more than 
one criterion. For instance, if we wish to select two groups in a school 
population, one of the third grade and the other of the fourth grade, we 
must choose randomly from the respective subpopulations. This 
assumption is the keystone of the analysis-of-variance technique. Fail- 
ure to fulfill this assumption gives biased results. 

(3) The subgroups under investigation should have the same variabil- 
ity. We should test this assumption before we run the analysis of vari- 
ance. Otherwise, a false interpretation of the results may follow. 

Maximum-Likelihood Solution of Analysis-of-Variance Problems. 
With One Classification. 2. Before we develop a general solution of the 
problems with any number of classifications, we start with the derivation 
of the solution for the problems with only one classification. The fre- 
quencies in different subclasses will always be assumed to be equal. We 
denote by Y., the score obtained by the ‘th individual in the sth subclass. 
The basic assumption in the analysis of variance 1s that we may write 
(2.01) Ya = М + А, + a 
whae eat ене, а. ee p denotes the number of sub- 
classes; n denotes the number of individuals in each subclass; M is defined 
as the general mean; A, is the deviation due to the sth subclass; and Zs: 
represents the random effect for the (th individual in the sth subclass. 
To minimize the variance of za by using the maximum-likelihood method, 


we first write 


2, n2 SY (Ya M - A.M) А, 
(2.02) x 22 ) 
where 

(2.03) y 4. = 0 


8 
on (2.01); and ^ is an undetermined multi- 
ting x? partially with respect to M and As, 
ual to zero, and solving, we obtain 


which is a restriction imposed 
plier of Lagrange. Differentia 
setting the resulting equations eq 


zai ad. (N = pn) 
(2.04) М = ў E: b Y pn 


(2.05) 
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From (2.03) and (2.05), we obtain 


t o = А 
А (2.06) کوک ر کر‎ о 


which reduces to 

(2.07) =0 

By the method of elimination, we have 
2 = == 2 = nm — 2 

pom 4 22 (Yu — Y) 22 Yin) f 

The hypothesis we wish to test is 

(2.09) Hy:A, = 0 


that is, the hypothesis that 


there is no significant difference between 
these subclasses. Assuming t 


hat Ho is true, we have from (2.02), 


(2.10) i х? = (Yu — My 
22 
Minimizing x? with respect to M, we obtain 
(2.11) M - f. 
Substituting this value into Equation (2.10), we obtain the relative 
minimum value № 
(212) x = 2 у (a= ува 2i Yi- NF? = уз ny y: 
8 t et 8 
— NY? 
=й 


The additive Property of the sum of Squares is readily demonstrated in 
(2.12). All the results obtained may be Summarized as in Table 54: 


Source of variation D.F. Sum of squares 


Within subclasses N-p 
Between subclasses 


Хо 
Total N-—1 УУ Yu? — Ny 
e t 


With Two Classifications. 
with two classifications—sa. 
Score obtained by the tth in 


Xa? 


3. Now we shall Work out the equations 
y column and row. We denote by Yayo the 
dividual in the sith column and the Seth row. 
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The basic assumption in the analysis of variance is that we may write 


(3.01) Y. = M + А. + В. Las + г. 

subject to the following restrictions: 

(3.02) J An. =0 

(3.03) ) В,-0 

(3.04) yd Jan = 0 ы 


where sı =1,°°°, pu S=1 °° +, p; t=1,° ++, т; pi denotes 
the number of columns; p» denotes the number of rows; n denotes the 
number of individuals in each subclass; M is defined as the general mean; 
A,, is the deviation due to the sith column; Bs, is the deviation due to the 
sath row; Iss, represents the influence of the interaction between column 
and row; and Zs. represents the random effects. To obtain the solution, 


we first write 


(3.05) x? = Ei l 2 (Ула — M — Ag — В. — Da? 
t 


de Fay An + =} Bu + е} D Dos 


в n 8 


d оз are the undetermined multipliers of Lagrange. 


where а, аз, ап ) 
ies M, An Bay and Iss» we obtain 


Minimizing x? with respect to 


ira 153 Y, = Y-(N = рірт) 


(3.06) Nant y 
І, 
1 аа 
(3.07) А. = тт 22 Үлем И Da 2nps 
E 
Ж”, 
РЧ v 82 o1 
ey ~ Ха p —2np 
p 
1 [Ur E tes RE 
(3.08) Ba, = 2.2 } Youu — М pi 2npi 
Iis 
5 2 ті E 
кө 2 pi 2npi 
1 y EL 
(3.09) Тыз ў, D Уны = M — Aa Вә — 3. 
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From (3.02) and (3.07), we have 


LÀ. 
(3.10) Уа, = 2 Yn.—mY..-*# = 0 


which immediately reduces to 

(3.11) a=0 

Similarly, 

(3.12) аз = оз = 0 

By the method of elimination, we get G 
еі = Bess 2 y 

(3.13) x 222 Bau Қары $3 The al Ұй, 

The hypothesis we wish to test first is 

(3.14) Hii. = 0 

that is, the hypothesis that 


there is no influence of the interaction 
between column and row. Assuming that Н 118 true, we have, from 
(3.05), 


$813 t 


боз t= DY) о-и, овур, 2 Ant BY в, 


where 8ı and @» are the undetermined multipliers of Т, 
ing x? with respect to M, Au, and B.., we obtain 


(3.16) M = 


agrange. Minimiz- 


ge 
(3.17) Ay = YF س‎ Bü 
2р» 
(3.18) В, = ү. ў. 8 
Т 2рт 
where 
(3.19) Bi = Be = 0 


Substituting these values in 
tive minimum value Ge 


ба) d =) aa Y, + y. 


88 85 t 


2+) > Ii. <%= Yat Y. 
$8 з t 


ха + 2% Ys m pin X E. TES pay Ys + NY.. 
а-а” " " 


(3.15) and Simplifying, we obtain the rela- 
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Then we may test the relative hypothesis on the basis of xis 
(3.21) Нә:А,-0 


that is, the hypothesis that there is no significant difference between 
columns. Assuming that Ho: is true, we may write 


(3.22) x = J J J Ca — M — Ba)? + y) Bu 


зз sz t 
where y is the undetermined multiplier of Lagrange. Minimizing x? 
with respect to M and B,,, we have 


(3.23) M = Y.. 

y PaL. 
(3.24) Bi = Ya = Ү.. бді 
where 
(3.25) үл0 


Substituting these values into (3.22) and simplifying, we obtain 
5 а = You = Via)? = х + t+ pm ) Y2. — УРА. 
@5%) x, = Y» 5 Fimi 4 2 


81 82 2 
2 2 
=x + xi + xà 


Finally, we may test the relative hypothesis on the basis of x7,: 


(3.27) Ho: Bs, = 0 
e is no significant difference between 


that i hesis that ther 
dia onde Lipa d proceeding as before, we obtain 


rows. Assuming that Hoz is true an 


(3.28) x, = J J J Fam + xi + xı + pin) Fèn 
КЕСЕК "Ps. 


=x +g + xh xe 
perty of the sum of squares is again clearly 


: itive pro 
Tim (3:28), аай ыне р f equal frequencies in sub- 


demonstrated. It is also noted, in the case o 
TABLE 55 


ANALYSIS OF VARIANCE FOR THE PROBLEMS OF DOUBLE CLASSIFICATION 
SI: Р 2 


Source of variation D.F. Sum of squares 
2 Xa? 
Within... cst tee ae 8 d 
Column X гоз... 7 = 
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classes, that there is only one answer for each hypothesis tested, no 


matter what the order of testing may be. All the results obtained 
may be summarized as in Table 55. 


4. In general, if we have a problem of k classifications, the mathe- 
matical expression of the score made by the ‘th individual in the sith 
group of classification A, the ssth group of classification B, . . . , and 
the sth group of classification R is as follows: 


(EOD) Yos. = Mb + Bet oe) +R, + Bg oh ri 
ала ee ee 9 Е S E 


where ёа = 1,°-+, pi 8 = 1, - o 


ам 
)95:077 j=l, c , ру; pi 
denotes the number of groups in classification A ; Рә denotes the number 
of groups in classification B; . . . ; рь denotes the number of groups 
of classification R; M is the grand mean; A, B, .. . , and В are the 
measures of the main effects with respect to their own subscripts; I's are 
the measures of the interactions with respect to their own subscripts; and 


ааз, <ом 18 the error. The solutions for the sum of squares of each 
source of variation are as follows: 
1. Within: 


(4.02) te ae 


k-fold 
2. Interactions and main effects: 


(4.03) iJ, > DE uas > & (2, e Eu) 
— 1 — 


k-fold subscripts 


r-fold subscripts - r-1 fold - анкер 
- Жж», x eoim) ice. i€-€j 
IGI Mam (2%) 
1 8i a) == 1 m 
جت‎ r-2 
„2 0918 subscripts 
е xg Vut e y > 
+ (DNF 
where i, + + · 13 71,2, + * + „Б; kis the number of classifications in the 
whole study; r is the number of classifications under calculation; s; (or s;) 
= 1,2, ° ° >, pi (огр); ôm is so determined that 
(4.04) A eg i pn (т-1,%%%,7) 
8i 8) 


and throughout the general ех 
which are not connected with 
be ruled out. For instance, i 
interaction between A and B, 


i 
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(4.05) pa^ ma) У Fas EUN 


s 8: 


= (pps ++ LP ZI IO АУА єз 8) 
му... 


For another example, if we calculate the sum of squares for the main 
effect A, the formula becomes* 


(4.06) pynt 172) ўз, ey = NĒ,- 
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CHAPTER XI 


APPLICATIONS OF THE ANALYSIS OF VARIANCE AND 
COVARIANCE METHOD 


We shall now apply the method of analysis of variance and covariance 
to a number of the simpler practical problems met with by the research 
worker. The application to some of the more complex types of situations 
in which these methods are indispensable will be made in the sequel 
to the results of specific experimental designs. 

We shall proceed first by ap 
ter X to the mathematical solution of the problem. 


Problem XL1. Single classification with equal representation in 
classes. Let us take the problem of measuring the resemblance in intel- 


ligence of identical twins reared apart as reported by Newman, Freeman, 
and Holzinger (Ref. 7). The data in the form in which we shall use them 
in this analysis are given in Table 56. We must first see if we can trans- 
late our problem into mathematical language. If it is amenable to such 
a translation, it can be expressed mathematically as a problem of testing 
statistical hypotheses, Mathematically, the relationship may be 
expressed thus: 

Xu = A + Cı + zy (11.01) 


"(= 19); Xu is the mental age of 
twins; A is a measure of the common 
mental age of the group tested; C, is a measure of the mental age of the 
ith twin pair; ги is the measure of the random effects, The restriction is 


Ya=o (11.02) 
t 


We must first test the hypothesis tha 
age scores is the same for all twin 
assumption underlying the analysis 
written 


hat the variability of the mental- 
pairs, since this is the fundamental 
of variance. The hypothesis may be 
Hoo; = с (11.08) 


d deviation of the ith twin pair. This 
of the L-test (see page 82), 
226 


where c, denotes the standar 


hypothesis is tested by means The calcula- 
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TABLE 56 
MENTAL AGES оғ 19 Pars оғ IDENTICAL Twins REARED APART 


Mental age 
Twin pair Sum Difference 
o Xu + Xx 1х, — Xx 
Xu Хи 
1 163 ы 186 349 23 
2 126 149 275 23 
3 191 194 385 3 
4 170 204 374 34 
5 171 178 349 i 
6 195 180 375 15 
7 170 172 342 2 
8 170 142 312 28 
9 195 185 380 10 
10 187 195 382 8 
11 176 222 398 46 
12 223 210 433 13 
13 181 182 363 1 
14 164 161 395 3 
15 175 171 346 4 
16 123 120 243 3 
17 192 175 367 17 
18 184 148 332 36 
19 168 151 319 17 
Sum 3,324 3,325 6,649 
gp 590,946 593,531 2,361,311 7,643 


i indicated i Wi = 117 
tions аге carried out as indicated in Table 57. With a value of Li 17, 
k = 19, and df. = 1, we refer to Nayer’s table (Table V, Appendix) 
and find that our value is greater than the table value (Lio = .096). 
Hence, we may accept the hypothesis Ho at the 1 per cent level]. Wecan 


now proceed to apply the analysis of variance method. : 
We use the maximum-likelihood procedure of estimating the sums of 


squares of the different components as shown below. We first write 
Za — A - С) + 2d) C (11.04) 
g = Y-A- Of + Ус 
» 


i ipli . Minimizing ¢ with respect to A, 
here Х is the multiplier of Lagrange. Minir 

с e x that is, differentiating partially with respect to A, С, and à, 
equating the resulting equations to zero, and solving them for the values 


A, C, and ^, we obtain 


= Y У fi be 
i t that for the case n = 2, the L;-test may sometimes 
ыа он be рош fng the hypothesis here at the 1 per cent level. 
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“ШІ Y ede (11.08) 
2n ^ 
с, =4) Xi = Жы = £,.- X.. (11.06) 
i 
A20 (11.07) 
TABLE 57 
CALCULATION Or L; IN TESTING Hoo: = с 
ГГ БЕНЕН 
0/ = 
тү? log n: n log n, Ya = Xa? log 6/ n, log 0 
H 
2| 1 .30103| .60206 264.5 2.42243| 4.84486 
2:| 1 .30103| .60206 264.5 2.42243| 4.84486 
2]| i1 .30103| .60206 4.5 . 65321| 1.30642 
2| 1 .30103| .60206 578.0 2.76193| 5.52386 
21 1 . 30103| .60206 24.5 1.38917| 2.77834 
2 1 -30103| .60206 112.5 2.05115| 4.10230 
21!1 -30103] .60306 2.0 .80103| .60206 
2:| 1 .30103| .60206| 392.0 2.59329| 5.18658 
2| 1 .30103| .60206| 50.0 1.69897| 3.39794 
Zi X -30103| .60206 32.0 1.50515| 3.01030 
211 -30103] .60206 1058.0 3.02449] 6.04898 
2 1 -30103| .60206 84.5 1.92686] 3.85372 
211 .30103| .60206 5 — .80103| — . 60206 
жү 1 .30103| .60206 4.5 .65321| 1.30642 
2| í -30103| .60206 8.0 -90309| 1.80618 
2| 1 .30103| .60206| 4.5 .65321| 1.30642 
2| 1 -30103] .60206 144.5 2.15987| 4.31974 
411 -30103] .60206 648.0 2.81158] 5.62316 
2| 1 .30103| .60206 144.5 2.15987| 4.31974 
| .60206 I— ——— 87 4.31974 
N = 38 log М = 1.57978|11.43914 3821.5 los У 0/ = 3.58224|63.57982 
t 


Substituting these values in 


(11.04) to obtain the absolute minimum 
value of ф, we have 


xi = 27 (Xi =~ x, )? (11.08) 
nen 
which is the basis for testing the following hypothesis: 
НвЕ(С) =0 (Fis the notation for expectation 
of a parameter ) о 


that is, the hypothesis that the 
of the particular twin pair to wh 
hypothesis is true, then (11.04) becomes 


$- 25 (Ха — A)? (11.10) 
$ t 


h 
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Minimizing with respect to A and substituting the obtained value, 
A = X.. in (11.10), we obtain the relative minimum: 


2 


xt = J J hu — £..)* = } Ke > Ko +I i= R.2 
i ^ ud (11.11) 
= + (11.12) 


where x? is the estimate of sum of squares for “within” and xj is the 
estimate of the sum of squares for "between." Then the test of Hı is 
given by 
EN NS 
Feci (11.13) 
with nı = n — 1 and m» = n. For purposes of calculation it is simpler 
to write x2 and xj in the form 


x d) Xu - Xn)? (11.14) 
t 
~ ұз 
А 0, 2 Xu) 
xi = 3 5 (Хи + Xa)? = mE (11.15) 
Calculations may be checked by 2 
Q b Xu) 
= pe) ==—o — (11.16) 
Separately, and using the identity, 
№ = № + (11.17) 


late the necessary values is shown in Table 
difference for each pair of values. We then 
f squares for each column, except the last, 
eded. By this method we secure 


The efficient way to calcu 
56; we first form the sum and 
calculate the sum and sum of sq! 
where the sum of squares only is ne 
a check on the calculations at each stage. | 

From the last two rows of Table 56, we obtain 


2 (Xu — Xa)? = 7643 (11.18) 
t 
I (Xu + Xa)? = 2,361,311 (11.19) 
t 
pi ) Xa — 0049 (11.20) 
i t 


у ху =} [J Ku + FX + 2 (Xu — Xa)? | = 1,184,477 (11.21) 


t 
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Substituting these values in (11.14), (11.15), and (11.16), we have 


xi = 3821.5 
xi = 17,255.5 
xi = 21,077.0 


We now place all of these values in one table as shown in Table 58. 


TABLE 58 


ANALYSIS OF VARIANCE oF MENTAL AGES or IDENTICAL Twins REARED APART 


Source of variation | D.F. Sum of squares Mean square F Hypothesis 


E 

Within pairs 19 3,821.5 201.192 а ва 

Between pairs 18 17,255.5 958.638 4.766 Rej. 
"Total 37 21,077.0 


Referring to the F-table with nı = 18 and n, 


— 19, we find that the 
obtained value of F is si 


vel. 'This statement 
independent of the twin 
pair to which he belongs, or th: 


1 at there is a significant difference among the 
means of the 19 twin pairs. tion is that the intraclass 


correlation between twins is er than zero. Intraclass 
correlation is discussed below. 
Fisher (Reference 3) has shown that а 


n unbiased estimate of the 
intraclass correlation, r’, can be obtained fr. 


om the relation 


1 k— 4 
Past De (11.22) 


where Ё is the number in a group or class, 


Where k = 2, 
1 2 
P= tt (11.23) 


А Lpr _ 958.638 
Thus in our problem, to = 201132 ^ 4.7662 


T = .653 
r — 958,638 — 201.132 
Alsa, " 7 85838 F (3— 1)(201132) = +658 


When there are equal numbers in the cl. 
of the class means relative to the Variation of the individuals within the 
classes is measured by the intraclass correlation. If the class means 
differ significantly, a significant Positive intraclass correlation is indi- 
cated; when the mean square between classes equals that within classes, 


asses or groups, the variation 
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the correlation is zero; and if the mean square between classes is less than 
that within classes, the intraclass correlation is negative. 

Problem ХІ.2. Testing the homogeneity of multiple groups of 
measurements. We shall apply the analysis-of-variance method to 
test the homogeneity of 6 sections in college zoology with respect to their 
achievement as measured by à final examination. The basic data are 


given in Table 59. 
Denote by Хи the score of the ith student in the sth section. The 


basic assumption in the analysis is that we may write 

Xu = А + Bet tet (11.24) 
+ pb, 2 Fg denotes the number of 
k denotes the number of sections. Aisa 
measure of the achievement of all the students and is defined as the mean 
score for all individuals and sections; B, is a measure of the achievement 
of the sth section; 2 is a measure of random effects, assumed to be 
normally distributed about zero with constant standard deviation, c. 


The restriction is 
5 B, =0 (11.25) 
8 


where s = 1, 2, °° 
students in the sth section, and 


In assuming that is constant, we are assuming that the variability 


of the scores is the same for each section. This assumption may not be 
fulfilled in practice, and hence, we must first test the hypothesis 


Hoos = с (11.26) 
TABLE 59 

s f Sum of squares 
Sum of scores um of squares | about means 
Section No. of pigeni um ШЕ: of scores 5. 
* د درو‎ 

Ns 

I 145 23,025 3,759,061 102,849. 7931 
II 91 13,529 2,065,833 54,472. 1099 
III 84 13,127 2,130,435 79,028. 7024 
IV 127 18,825 2,912,131 121,732.3779 
ү 46 6,828 1,071,968 58,455.3043 
VI 82 19,889 2,108,159 82 228 2560 
Total 575 88,223 14,047,587 — 511,417.0036 


eviation of the scores in the sth section. 


where o, denotes the standard d | 
If this hypothesis is accepted, we conclude that there is no difference in 


variability among the sections and then proceed to test the other hypothe- 
sis. If we reject the hypothesis Ho, we cannot make an exact test of 


another hypothesis. 
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The test of the hypothesis Нұ may be made as follows. We calculate 


= T] (x УП (%) (11.27) 


where № = 2 ns, II denotes the product, and 0! denotes the “within” sec- 


tions sum of squares for the sth section. We refer to Nayer’s tables of the 


rmonic mean of fs, 
where f, denotes the d.f. associated with 6/ in the sth section. The rule 


The computation of the L, 


for the 6 sections is carried out as shown in 
Table 60. 


TABLE 60 


CALCULATION OF Lı ror THE TEsT оғ THE HYPOTHESIS Ho, = с 


log л, па log ns [/4 


na log 04! 
145 2.16137 | 313.39865 102,849.7931 726.77045 
91 1.95904 | 178.27264 54,472.1099 430.99238 
84 1.92428| 161.63952 79,028. 7024 411.41352 
127 2.10380 | .267.18260 121,732.3779 645.84707 
46 1.66276 76.48696 58,455.3043 219.27372 
82 1.91381 156.93242 82.228.2560 4.91502 | 403.03164 
N = 575 | log N = 2.75069 | 115391279 


207 = 498.768.5436 | Tog 207 = 5.09790 2837 32878 


where 
1 1 
log Li = log N — N к п, log n, + т % т, log of — log % ۵ and then 
n n 


find Lı from a table of antilogarithms, 


Here, log Lı = 2.75967 — 2.00680 + 4.93448 — 5.69790 
9.98945 — 10 
Li = .9760 


The harmonic mean of i= 


А To find the value of Li, we calculate the value of log Li, 


1 


Т ыға = 8204 

Referring to Nayer’s tables with k = 

Р > .05. We accept the hypothesis Н 0 а 

are of equal variability. Consequently, 
of variance. 


The next step is to estimate the sum of squares for “within.” By the 
method of maximum likelihood, we obtain 


%-2)04-А-влааУһ, 
ae 8 


6 and d.f. = 83, we find that 
nd conclude that the sections 
we can proceed to the analysis 


(11.28) 


Cuar. XI] APPLICATIONS OF THE ANALYSIS 233 


Differentiating ¢ partially with respect to A, В., and А, equating these 
equations to zero, and solving the resulting equations for the values of 
A, B, and ^, we obtain 


1 в 
A= iiim = 22. (11.29) 
B, -lXlx.-4 -X.—X.. (11.30) 
A= (11.81) 


Substituting these values in (11.28) to obtain the absolute minimum 
value of ¢, we have 
VES 
t 


side) ә 


8 
which is the basis of testing the following hypothesis: 
E is the notation for expectation of 
Hy: E(B.) = 0 F parameter ) (11.33) 


that is, the hypothesis that the sections are equ 
hypothesis is true, then (11.28) becomes 


ф = у (Xu — A)? (11.34) 
s t 


alin achievement. If the 


Minimizing with respect to A and substituting the obtained value of 
A = &.. in (11.34), we obtain the relative minimum: 


(Xx) Qxy (YY Xe)! 
а-аа 
қ = м xb | (1.35) 


he estimate of sum of squares for “within” and x? is the 


where x2 is t LA 
ares for “between.” Then the test of Hı is given 


estimate of sum of squ 


by 
У-бж 
БОО (11.36) 


with nı = Banda, = N - 6. . | 
The “within” sum of squares may be obtained directly from the last - 


row of Table 60, 2 0! = 498,766.5436. The “between” sum of squares 


8 
is calculated from the totals give 
follows: 


n in the third column of Table 59 as 
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: 2 (12,880): 
25,025)! | (13,529)? | (13,127)? , (18,825)? _ (6828)? (12, 
S + «o t s t gw tug 4 


— 12,050.4599 


. (88,223)? 
575 
The total sum of squares is 


— 14,047,587 — (88223 = 511,417.0036 
To test the hypothesis Н 1, We calculate 


569. 12,650.46 _ 2530.092 _ 
FS 498,760.5436 ^ 876.567 = 2-886 


Referring to the F tables with ny 
05 > P > 01. Statistically, the ace 
We may state that the differences 
Significant at the 5 per cent level bu 


= 5 and n; = 569, we find that 
eptance of H, remains in doubt. 
among the means of sections are 
t not significant at the 1 per cent 


level. The results are summarized in Table 61. 


ч Меап ; 
Variance square F Hypothesis 
ا صم‎ 
12,650.46 2530.092 
498,766. 5436 876.567 
511,417 .0036 


E ا‎ 


Between sections 
Within sections 
Total 


2.886 | Remains in doubt 


analysis of variance could be used, Ь 


ji as (Xi — Xj) = 1588 — 148.2 


d qi I 1 
“VN, tN, 28.84 145 + qz; 
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for n = 270 the corresponding P < .01. The difference selected, how- 
ever, is only 1 of 15 that might be found among the means of the six 
sections. The required probability for the selected difference to. be 
significant is set, therefore, not as 1 in 100 but as 1 in (15)(100) = 1500. 
Since the probability corresponding to the observed value of ¢ is about 
.0024, or 2.4 in 1000, it is greater than 1.5 in 1000 and therefore is regarded 
as not significant. 

Problem XI.3. Ап application of the analysis of covariance. The 
process of applying the analysis of covariance consists in breaking up 
the sum of products into parts assignable to different factors. This is 
comparable to the process of breaking up the sum of squares in the case 
of the analysis of variance. 

We shall apply the method of the analysis of variance and covariance 
to the combined analysis of mental-age scores and educational-age 
scores, as measured by the Stanford Achievement Test, in the case of the 
19 pairs of identical twins reared apart. 

Let X; denote the educational age of the ith member of the tth twin 
pair and Y; the mental age of the ith member of ће ith twin pair. We 


may then write 


Ха = A + C + 2а (11.37) 
and = BE Dc p^ (11.38) 
with restrictions 
Ye-0 (11.39) 
t 
2 р. = 0 (11.40) 
t 


where? = 1, 230 = 4,2, °° 555 where n is the number of twin pairs. 
The difference between the educational ages of the pairs of identical 


twins may be due partly or wholly to differences in mental age. The 
problem is to find out what part of these differences may be assigned 
to differences in mental age and to adjust the analysis accordingly. We 
wish to find out whether there is a difference in achievement of the 


identical twins when they may be regarded as of the same mental age. 
If we may assume that there is a linear relationship between educa- 


tional age and mental age, we may write, since Yx denotes the mental 
"owe : В 
age of the ith member of the ith twin pair, 

Xa = a + Ya + Sa (11.41) 
where a and b are parameters to be estimated from the data; b is the 
regression coefficient of educational age on mental age; Sa is the measure 


of the differences between the educational ages of members of the same 


pg 0‏ د 
For the test of linearity see page 240.‏ 2 
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twin pairs and of those between the educational ages of pairs of twins 


not attributable to the factor of mental age. 
.As formerly, we minimize: 


$-» 2 (Xa — a — bYy)? + 2, > C + 2, 2 D, (11.42) 
t t t 


i 


with regard to a, b, X, and X; to obtain the relative minimum of ф, x?. 
Solving for a, b, i, and Хә, we have 


di % 0 2 Ха-% 2 2 Ya) (11.43) 
[X Xu + x0) [F Fu + уз] 


3, (Х.У, + Хы) — + 
b= 


ny (11.44) 
Y. 9279 


2n 
М = 0 (11.45) 
м = 0 (11.46) 
woe the values of a, b, X, and л» into Equation (11.42), we 


x = 1) (Xu — Xn)? + i р (Xu + Xa)? — % 2 A 


n 


[> (Xu + Xa) | 2 (Yu + = 
t 


b (Х.У, + Хау) — — 


i 


(11.47) 


a 


= xa + xi, say (11.48) 


The proportion of the variance attributable to mental age is 


(“іш x4] 5 u + ¥, Y 


b (Х.У, + ХҮ) £ 


(11.49) 
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To obtain х2, we subtract l from the “between” pairs sum of squares, 
since the other two quantities in (11.47) are the “within” and * between" 
pairs sums of squares for educational age. 

The necessary calculations may be efficiently carried out if the data 
are arranged as shown in Table 62. 

The results are presented in tabular form in Table 63. 


TABLE 62 
EDUCATIONAL AND MENTAL AGES or 19 PAIRS or IDENTICAL Twins REARED APART 


Educational Mental age 

дб age Хи Үй а 

|а, ха| Xu + Хи Xu- Yul Xu Yu (үн ITO 
Xu Xu Yu Yu 

1 181 200 19 381 163 186| 29,503 37,200 132,969 
2 131 169 38 300} 126 149| 16,506 25,181 82,500 
3 205 189) 16 394 191 194| 39,155 30,666] 151,690 
4 173 207 34 380} 170 204| 29,410 42,228] — 142,120 
5 176 182 6 358) 171 178| 30,006 32,396] 124,942 
6 151 155 4 306) 195 180| 29,445 27,900) 114,750 
7 191 180 2 380 170 172| 32,470 32,508] 129,960 
8 175 162 13 337 170 142| 29,750 23,001 105,144 
9 210 202 8 412 195 185| 40,950 37,370] 156,560 
10 181 200 19 381 187 195| 33,847 39,000] 145,542 
11 157 226 69 383 176 222| 27,632 50,172] 152,434 
12 224 210 14 434| 223 210| 49,952 44,100] 187,922 
13 196 189 T 385 181 182| 35,476 34,398] 139,755 
14 176 150 17 335| 164 161| 28,864 25,599) 108,875 
15 159 161 2 320] 175 171| 27,825 27,531] 110,720 
16 130 131 1 261 123 120] 15,990 15,720 63,423 
17 176 176 0 352] 192 175] 33,792 30,800] 129,184 
18 192 157 35 349] 184 148| 35,328 23,236] 115,868 
19 177 172 5 349) 168 151| 29.736 25,972] 111,331 
Total | 3,301 3,4436 G.797| 3,324 3,325 5051727 610,081| 2,405,089 

Sum of 

squares| 605,187 631,518 10,417 2,462,008 590,946 593,531 1,206,708 


TABLE 63 


хә COVARIANCE OF EDUCATIONAL AND MENTAL AGES 


ANALYSIS OF VARIANCE A 


Sums of squares 
Sums of Regres- | Correla- 
4 D.F. 5 sion tion 
Variance Mental | Educational i: an coefficient | coefficient 
age age E 
мыны aa | Л 18825 | 1065099 0.785 | .822 
Within pais) 9 | sagip | 5208.8 | 8863.80 | 1 оп | .866 
er 5 0770 | 20996.8 | 17,411.900 | 0.827 | .8% 


The quantitie: 


s entered in Table 63 are calculated as follows. 
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Educational age: 


Between pairs: 4 2 (Xu + Xa)? = h D 2 хау = 15,727.8 
4 4 


v o£ 


Ш 


Within pairs: i» (Xu — Xa)? 5208.5 
t 


Total: 2 lx — di % lx 


Mental age: 


Between pairs: +) (Yu + Yo)? — ә 0 > Yo) = 17,255.5 
1 7 т 


Within pairs: i» (Yu — Ya)? = 3821.5 
t 


Total: У È Y} — #4 % E Yay = 21,077.0 
t Da 


І 


20,936.3 


i 


Products of educational age by mental age: 


Between pairs: iy (Xu +X 2)(Yu + Ya) 
t 


[о[у or, + у) 
t t 
= 20,405,689) — d lozo7 6649] = 13,5484 
Within pairs: У XuY, + 2 Жыш BD (Xu + Xy, + Ys) 
t t t 
= 1,206,708 — 3(2,405,689) = 3,863.5 
Total: XuYu+ Ух, Үз 
Juru + J 


t 
x 5 (Xu + Xa) ] № (Yu + Ya) ] 
= 1,206,708 — ¥816797][6649] = 17,411.9 


Two methods for adjusting the sum of squares of ed 
given. The first method makes possible a more n 
nificance. The adjusted sums of Squares are obt; 
“within” pairs and "total" each with its own re 


T nct .. (8863.5)? м 
Within pairs: 5208.5 -38215 = 1302.5 


| _ (17,411.9)° _ 
Total: 20,936.3 ioy = 6552.5 


2 
Between pairs: 6552.5 — 1302.5 = 5249.7 


eee‏ س = ی 
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TABLE 64 
ANALYSIS OF VARIANCE OF EDUCATIONAL AGE SCORES or THE 19 PAIRS OF IDENTICAL 
TWINS—METHOD 1 ORIGINAL AND ADJUSTED SUMS OF SQUARES AND MEAN SQUARES 


Original analysis - Adjusted analysis 
Variance D.F. D.F. 
Sum of Mean Sum of Mean 
squares square Squares | square 
Between pairs 18 15,727.8 873.767 18 5249.7 291.65 
Within pairs 19 5,208.5 274.132 18 1302.5 72.36 
Total 37 20,936.3 565.846 36 6552.2 


A second method of adjusting the sum of squares is shown in Table 65. 
Both the “between pairs" and the “within pairs" sums of squares are 
adjusted by the use of the “within pairs” regression coefficient: 


(ж — by)? = Za? — 2bZay + b? Ty? 
3863.5 3863.5 
15,727.8 — 2 35215 (13,548.4) + (2) (17,255.5) 


15,727.8 — 2.02198(13,548.4) + 1.02210(17,255.5) 
15,727.8 — 27,394.5938 — 17,636.8466 = 5970.053 


In certain cases it may be necessary to adjust each sum of squares 


with its own regression coefficient (Ref. 5). 


TABLE 65 
ANALYSIS OF VARIANCE OF EDUCATIONAL Aap Scores or THE 19 PAIRS or IDENTICAL 
TwINs—METHOD 2 ORIGINAL AND ADJUSTED Sums or SQUARES AND MEAN SQUARES 


Adjusted analysis 


Original analysis 


Variance Sum of Mean 


squares square 


Sum of Mean 
squares square 


18 5970.058 331.670 
18 1302.500 72.361 


15,727.8 873.707 
5,208.5 274.132 


20,936 .3 565.846 


Between pairs 
Within pairs 
Total 


een pairs sum of squares and mean square give 
a measure of the difference between twin pairs in educational age freed 
from the influence of mental age., To test the hypothesis that these 
adjusted differences are zero, we calculate: 


1 mean square between pairs 
ag loge mean square within pairs 


The adjusted betw 
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and refer to Fisher's tables of z with degrees of freedom n, — 


=n — 1 and 
nz = n — 1, where n is the number of twin pairs. In our example we 


vin: 291 6 1 
= :65 ri E Table 64) 
20 = 5 loge 2236 rz log. 4.03 = .697 (Table 
1 331.670 1 
or в = 5 log. 77936 5 log. 4.584 = .761 (Table 65) 


From Fisher’s tables of 2, entere 
and n; = 18, we find that zy is great 
at the 1 per cent level. We could al 
F. We reject the hypothesis and co 
age is removed, the means of the 
significantly. 


We obtain three measures of the degree of relationship between 


educational age and mental age from the results of Table 63. From the 
first row, for between pairs, we have 


d with degrees of freedom n, = 18 
er than the value given in the table 
80 have used the tables of Snedecor's 


ғ 13,548.369 _ 13,548.369 
Y = = 
vV (17,255.5) (15,727.8) vV 271,391,052.9 
From the second row, for within pairs, we have 


3863.5 
mU o 2—5 — — F 
(3821.5)(5208.5) — 866 


From the third row, for the total, we have 

= 17,411.9 17,411.9 829 
FECI i ЖЕНЕ = 
v (21,077)(20,936.3) ^ 21,008 


The second, r, = 
ship; in the third, тз = 


.822 


Ts 


the observational data, 
larly the product-moment 
of linearity of regression. 
forward method of testing 4 


" he type of regression. Since linear regression 
is the type most often encountered, we shall consid 


Д : : à er here the problem. of 
testing the linearity of regression (Ref, 5).z 
Dci леді 

3 For other cases of polynomial e. uati i 
of squares corresponding to indivi дев ап special 


ual di 
effects are represented by polynomials of differen hon 


ient, involves the assumption 
The analysi i 


y for the separation of Hume 
9m where the independen 
t degree, sce page 309. 


2 
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We may take as a practical problem the case presented in Table 66, 
that of determining whether the relationship between the scores of the 
same individuals on two tests, one administered prior and the other 
subsequent to instruction, was linear in form. “We shall also test another 
assumption underlying the product-moment correlation method, the 
homoscedasticity of variances of the different arrays, that is, if the 
variances of the different arrays are equal. 


TABLE 66 
CORRELATION TABLE FOR THE INITIAL AND FINAL Scores or 263 STUDENTS ON A Test 
ім COLLEGE BroLocy 


X (initial score) 


10-|12-|14-|16-|18-|20- 26—|28—|30—| F 

Y 44- 1 
42- 1 
40— 1 2 
F 38— 2 1 3 
i 36— 1 2 4 3 4 4 5 32 
n 34— 2 2 5 6 3 6 3 34 
a 82- 1 2 5 5 3 4 4 36 
1 30— 114! Bi wi 9| % 4 37 
28— 2 7 8 1 5 3 2 31 
S 26— 1 4 5 5 3 2 2 30 
c 24— 2 5 6 3 3 3 1 29 
o 22— 1 2 4 1 1 14 
r 20— 2 1 1 1 8 
e 18— 1 1 3 
16— 1 1 
14— 1 

F 30 29 10 | 263 


n the initial and final tests, respec- 


Let X and Y represent the scores o 
when linear, is given by 


tively. Then the regression function, 
?-а-ыХ-Х) (11.50) 


where a and b are two parameter values, the value chosen for a being the 
mean, Y, of the observed values Y, and the value given to b being the 
7 Ц 


estimate of the regression coefficient of Y and X. f is the expected 
. value of Y for each X, and X is the mean of the X values. 


In Table 66 the data are grouped, and we shall take as the selected 
values of X the mid-points of the several class intervals as shown in Table 
67. It is observed in Table 66 that for each X the several values of Y 
бона an array. Then, letting Ya represent the score on the final test 
of the tth individual in the sth array, we have 

Pu = A + B: + Zu (11.51) 
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wheret = 1, 2,  - - , na; s = 1, 2, - - - , k; k denotes the number of 
arrays and n, the number of individuals in the sth array. A is the 
mean of the scores of all individuals on the final test; B, gives the measure 
of achievement on the final test of all individuals in the sth array; and 
2, represents the measure of residual variation or the portion of Yu 
attributable to random factors, such as errors of measurement, which are 
independent of X. 2, is assumed to be normally distributed about 0 
with standard deviation e, supposed to be the same for all arrays. The 
latter assumption will be tested first, by using the Li;-test. The sums 
and sums of squares of the scores on the final test in each array are given 
in Table 67. From Welch's formula for the L;-test, 


Lı = [I (sj m (ss (11.52) 


n 


we have 


log Li = log N — E bi ns log na + ye log 6; — log D е) (11.58) 


TABLE 67 
Sum AND Sum or Squares оғ FINAL Scores IN EACH Array 
Sum of ere 6) " 2 2M 
Array vaye : Scores of scores. 2 u) (Y Yu) 
No. g ра аттан Ho iui — 
i X, > Yu > Үз Ns у TN Ns 
t t 
t 
(1) (2) (3) (4) (5) (6) (7) 
1 10.5 12 318.0 8,659.00 8,427.00 232.00 
2 12.5 10 265.0 7,286.50 7,022.50 264.00 
3 14.5 14 407.0 12,379.50 11,832.07 547.43 
4 16.5 30 841.0 24,149.50 23,576.03 573.47 
5 18.5 41 1190.5 35,408.25 34,565.62 842.63 
6 20.5 33 1016.5 32,016.25 31,312.80 703.45 
7 22.5 29 852.5 25,809.25 25,060.56 748.69 
8 24.5 30 919.0 29,023.25 28,152.03 870.22 
9 26.5 32 1028.0 33,484.00 33,024.50 459.50 
10 28.5 22 713.0 23,447.50 23,107.68 339.82 
11 30.5 10 325.0 10,930.50 10,562.50 368.00 
Total 263 | 7875.5 | 242,593.50 | 236,643.29 5950.21 
In our problem, as shown in Table 68, we find 


log Li = 2.4200 — 1.4219 + 2.7593 — 3.7745 = 9.9829 — 10 
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So we obtain Lı = .961. Referring to Nayer’s tables with k = 11 and 
harmonic mean, 


11 
^ Tr + $ + + зу + چچ‎ tos + тв + چو‎ ға + ج + جو‎ 
= 18 
TABLE 68 
CALCULATION or Li FOR THE TEST оғ THE Hypornesis Hoio, = с 
fs n. log ns n, log n. 6, log 6, n; log 6, 
11 12 1.0792 eis 232.00 2.3655 
9 10 1.0000 s 264.00 2.4216 
13 1+ 1.1461 2G 547.43 2.7383 
29 30 1.4771 -— 573.47 2.7585 
40 4l 1.6128 — 842.63 2.9256 
32 33 1.5185 — 703.35 2.8472 
28 29 1.4624 sara 748.69 2.8742 
29 30 1.4771 zia 871.22 2.9401 
31 32 1.5051 eim 459.50 2.6623 
21 22 1.3424 ses 339.82 2.5313 
9 10 1.0000 ада | 368.00 2.5658 
263 n, log т. = 373.9627 5950.21 n; log 0,! = 725.6954 


we find that the value of L, is greater than the tabled value at the 5 per 
cent level, so we may assume that c; is constant. The first analysis of 
the scores for the final test is given in Table 69. 


TABLE 69 
ANALYSIS OF VARIANCE OF Scores ON FINAL Test 


Source of variation D.F. | Sum of squares |' Mean square 


Between m f arrays 10 812.49 81.249 
Within nium Башы 252 5950.21 16.904 
Total 262 6762.70 | ...... 


The analysis consists in breaking up the total sum of squares into two 
Components. One component gives the mean-square estimate of the 
Population variance between means of arrays and the other the mean- 
Square estimate within arrays. The respective mean squares are given 
in the analysis-of-variance table. The sums of squares for each source 
of variation are obtained as follows, making use of the totals recorded in 
Table 67. 

The Within-rays sum of squares is the total of column (7), 5950.21; the 
between-means of arrays sum of squares is obtained from the totals of 
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columns (4) and (6): 
236,643.29 — (7875.5) 


268 T 812.49 


and the total sum of squares is calculated from the totals of columns (4) 
and (5): 
(7875.5)? _ 


242,593.50 — 263 6762.70 


The hypothesis Hı, that the regression of Y on X is linear, is stated as 
follows: 
Hi:Y, = a + b(X, — X) (11.54) 


where Ӯ, is the expected value of Y for X., the sth value of X. If Hi 
is accepted, then Equation (11.51) may be written 


Yu = а+Ы(Х, — X) (11.55) 


Н; may then be tested in the conventional manner of testing a linear 
hypothesis. We have 


x = УУ Yu- 4 — By (11.56) 


We then minimize x? with respect to all parameters to get the absolute 


minimum x2. Thus: 
OI 
аа Д 


8 


(11.57) 
which gives the sum of squares within arrays. 


We then minimize x? with respect to the parameters remaining under 
the assumption that Н is true. Thus, minimize 


x=) Yu — a — bx, - Xy (11.58) 
8 t 


with respect to a and b to get the relative minimum, х. We get 


TES 22% - 3, (11.59) 


У^ 


ZLE - ® (2 (| 


b = = (11.60) 
У (X. — Х)°] 
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Then we may write 


59629) 


хе = х + 
E 
г 


Шо-»( | | n 
| Y nos — 3] 


Ns 


= xa + хі, say 
xé is observed as equal to the “between means of arrays" sum of squares 
minus the quantity /, where 


[= Or] 


= - (11.62) 
у ЫХ, — Х)4 
We now test the hypothesis Hı by calculating 
ni 
Г = — 11. 
x (11.63) 
Ne 


and then refer to Snedecor’s tables of F (Table IV, Appendix) with 
n= k — 2 and m = Y n — k. 


8 
The components with the corresponding calculated values are then 
entered in an analysis-of-variance table. The quantity l is entered as 
the variation “due to linear regression” and x? as the variation “due to 
departure from linear regression.” 
We now proceed to calculate І using the values recorded in Table 70, 
from which we get 
(452.70)? 
1 = "7046.68 


Finally, the complete analysis in our problem is summarized in Table 
For the test of the hypothesis Hı, we obtain 


_ 87.046 
= 16.904 


We enter the F-tables with nı = 9 and л» = 252 and find that Р, is 
Sreater than the interpolated value of F at the 1 per cent level. There- 
fore, we reject the hypothesis Hı and conclude that the regression of Y 
on X is nonlinear in form. 


= 29.08 


Fo = 545 
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TABLE 70 
CALCULATION OF THE VALUE оғ 2 
Ns y x,-X z Ya (X, — X)((ZY.)| n,(X, — X) 
t 
| 
12 10.5 —10.6 318.0 —3370.80 1348.32 
10 12.5 — 8.6 265.0 — 2279.00 739.60 
14 14.5 — 6.6 407.0 — 2686 . 20 609.84 
30 16.5 — 4.6 841.0 —3868 .60 634.80 
41 18.5 — 2.6 1190.5 —3095 .30 277.16 
33 20.5 — 0.6 1016.5 — 609.90 11.88 
29 22.5 1.4 852.5 1193.50 56.84 
30 24.5 3.4 919.0 1286.60 346.80 
32 26.5 5.4 1028.0 5551.20 933.12 
22 28.5 7.4 713.0 5276.20 1204.72 
10 30.5 9.4 325.0 3055.00 883.60 
263 hab mom | aama 452.70 7046.68 
TABLE 71 


ww. “ANALYSIS or VARIANCE or Scores on FINAL TEsr—CoWwPLETE ANALYSIS 


Source of variation D.F. | Sum of squares | Mean square 
Dinar TegréBSIOn «cos vos vibe таз ies 1 29.08 29.080 
Departure from linear regression......... 9 783.41 87.046 
WIERD АТ Уа» von stew orte] ym nter ж 252 5950.21 16.904 
"Total 262 8762.70 | «c 
87.046 
Fo = 16904 ^ 5.15 


The same methods could be used in testing the form of regression of 


X on Y. 

Problem XI.5. The complete procedures for the analysis of variance 
and covariance for the data of a single classification. In order to illus- 
trate how to calculate all the numerical values needed in a complete 
analysis of variance and covariance in the case of a single criterion of 
classification, how to proceed with the application of principles including 
the testing of underlying assumptions, and how to interpret the results, 
application has been made to the following problem. We wish to 
systematize the operations involved in the analysis in the most efficient 
way. 
The primary data are given in Table 72, which gives the initial and 
final scores on a test of educational development, and the mental ages of 
54 high-school students classified by grades, 18 students in each of the 


tenth, eleventh, and twelfth grades. 
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We wish to test the hypothesis that educational development is 
independent of grade, that is, that the mean achievements of students 
in the three grades are equal. The complete analysis is in three parts: 

In Part I, we caleulate all the values required for the complete 
analysis and carry out the analysis of variance on the final test Scores; 

In Part II, we give the complete procedures for the analysis of vari- 
ance and covariance with one independent variable; 

In Part III, we present the complete procedures for the analysis of 
variance and covariance with two independent variables. 


TABLE 72 
MENTAL Ages, INITIAL AND FINAL Scores on AN EDUCATIONAL DEVELOPMENT TEST 
or 54 Нісн-Ѕснооі STUDENTS CLASSIFIED BY GRADE* 
РЦ А8 


Grade 10 Grade 11 Grade 12 
Final M.A. Initial Final M.A. Initial Final M.A. Initial 
Yu Xu Zu Ys 2 Zu Yu Xu Zu 
30 45 28 26 62 22 29 60 25 
25 58 22 26 57 21 29 88 24 
22 46 19 24 65 21 22 64 19 
26 56 22 24 54 25 23 64 21 
17 19 14 23 55 18 20 47 17 
14 29 14 15 24 13 19 75 17 
18 34 18 18 40 17 17 29 16 
17 17 14 16 24 13 15 38 15 
12 19 9 13 23 12 14 28 12 
21 44 16 26 60 22 33 94 29 
21 44 21 25 57 22 29 89 29 
19 6 17 23 52 19 25 78 22 
20 38 18 22 54 19 23 50 21 
18 27 16 21 54 19 18 57 19 
14 18 14 17 52 16 17 43 17 
14 18 9 19 40 17 15 36 13 
12 18 7 15 28 12 15 35 14 
9 5 7 13 48 12 10 14 9 


* Mental age, in terms of months, has been reduced by 100. Define Yr, Х., and 
Zu as the final, mental age, and initial scores, respectively, for the {th individual in 
= sth group; where s = 1, 2, 3, denoting grade 10, 11, 12, respectively, and ¢ = 1, 
“+ ,18. 


Part I 


Step 1. Calculate the following values: 
(Some of the values reported here were calculated for later use and 
need not be considered in the analysis-of-variance procedure.) 


au =) Y}, = 900 + -- +81 = 6511 
t 


248 


а 


сп = 


C21 
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= ) YR = oroe y 
t 

=) Yh = B41 
t 

= У Х = 2025 + -+ > 
t 

= У Хи = 3844 + xx 


= Ý X} = 360+ · · · 


= J Z} = 784 + өм» 
D 

= 2, = 484 + ess 
t 

= J Z} = 625 + РИ 
t 

-Y qux = 
t 

= b (YuXa) = 
t 

E ) (YuXa) = 1740 + · · 
t 

= J (Yuu) = 840 + c 
t 

= b (Yala) = 572 + °°° 
t 

= D (Yau) = 795 + e 
t 

x 2 (XuZu) = 
D 

см 2 (Хайа) = 
Т = 

3 5 (Хади) = 
t 


1350 + -=> 


1612+ · · · 


1260 + · · · 


1364 + · · · 


1500 + · °° 


+ 25 


+ 169 = 7806 
+ 100 = 8413 
= 20,727 


+ 2304 = 43,317 
+ 196 = 63,595 
+ 49 = 5047 
+ 144 = 5970 


+ 81 = 6909 


+ 45 = 11,099 


+ 624 


18,169 


- + 140 = 22,737 


+ 63 = 5701 
+ 156 = 6808 
+ 90 = 7607 


+35 = 9756 


+ 576 15,884 


+ 126 = 20,587 


18 


ІСнағ. XI 
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wo = -тш 
€ 
ке Qa) = ger = 40,045 
tes Qu = 69). 54,340 
„- 02) _ ош = 4513 
c 09) emt us 
- Gy NO 


% Yu) 4 Xu ЕСІНЕ 


20 á ) qx “) $608 .. 


69654. es. 
MTM 
MT 
EP ape 

659649. saan 


C34 = 


Cig = 
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= 9888 


17,263 


= 20,494 


5209 


6507 


7025 
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(Sx Z 
Cos © 906 ) = S49 (920) 15,093 
Cas 9 50 *) 989639) _ 18,626 
T 
d, = 9 2 ) 1068)" _ 21,198 
(XS x.) 
dz = — 2 ) = Ger 104,808 
Zu)" 
"o 2 Ж Ө) 10,508 
(31902 
po SET 22 ) = mnn = 47,081 
(D2 ra) 024. 
fj ы єз ) 1068944) = 18,670 
(079) (072) „ 
کے‎ omg ша 
а = У Ў Y} = an + an + as = 22,730 
8 t 
аз = 22% = аә + ds» + às» = 127,639 
8 t 
аз 


= J) Z} = au + an + аз = 17,926 
8s t 


а= 22 (У.Х) = dis + аза + аза = 52,005 
8s t 

аз = %2 (Ү.2.) = а + des + азу = 20,116 
8 ¢ 


as = D y (Хы®ы) = dis + azs + azs = 46,227 
nan 
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= си + Co1 63i = 21,184 


= сз + ca + Саз = 110,645 
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2609 2)! 


Cs = Сіз + Соз + C33 = 16,587 
2 107 ? (2 х.)] 

C4 = = си + Cog + C34 = 47,645 
YG20:23] 

с; = — 18 = сїз + Сә + С = 18,741 
SOx) 52 

co = 2— + i : Cis + С + C36 = 42,285 

Step 2. Calculate the sum of squares of y for each group. 


Define 
=3 (ru —F = ا‎ 
t 
% b 


Базы з 2 
where Y, = 18 


It is obvious that 


(4) 


“Ts‏ = ر د 


Therefore, we have 


б = an — си = 498 = Уй, 


t 
02 = an — en = 364 = Y Уй 
t 


05 = 57 ap == 684 = уй 
t 


Step 3. Use the L;-criterion to test the hypothesis H,:0, = c. 


calculations involved are summarized in Table 73. 


The 


Step 4. Calculate the following values for the analysis of variance 
of y. The sums of squares for the different sources of variation are 


(see Step 1): 
(1) Within grades = a: — cı = 1546 
(2) Between grades = cı — di = 61 
(8) Total = a, — dı = 1607 
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Б TABLE 73 
L;-CAncuLATIONS FOR Hio, = с 


fi log n, nalog ns | 6б, log 6.’ Na log 0. 
| 
I 

17 | 498 2.6972 

17 364 2.5611 

17 684 2.8351 


1546 


у п, log б, = 145.6812 


8 


1 1 
log Lı = log N — a т. log n, + х} ns log 0.’ — log (У e) 
8 8 8 


log 54 — тц (67.7862) + (145.6812) — log 1546 
1.7324 — 1.2553 + 2.6978 — 3.1892 

9.9857 — 10 

. Lı = .968 


Refer to Nayer's tables of Lı (Table V, Appendix) with k = 3 and degrees of 
freedom f = 17. We have P > .05. Therefore we accept Hi. Assuming that the 
three groups have common variance, we may combine the results. 


Step 5. Analysis of variance to test the hypothesis Hy:Y, = Y. 
The results are summarized in Table 74. 


TABLE 74 
ANALYSIS OF VARIANCE OF FINAL SCORE ron DIFFERENT Grapes 


Source of variation | D.F. | Sum of squares} Mean square Ғ Hypothesis 


tested 
Within grades 51 1546 30.31 әне vise 
Between grades 2 61 30.50 1.01 Accepted 
Total 53 1607 


mean square of between grades 


Wh P = T 
ere Р mean square of within grades 


Refer to Snedecor's tables of F (Table IV, Appendix) with n; = 2 and ms = 51. 
We have P > .05. Therefore, we accept the hypothesis Ho and conclude that there 
are no significant differences among the means of the three grades. 


Part II 


Complete procedures for the analysis of variance and covariance with 


one independent variable. 
Step 1. Calculate the following values (see Part I, Steps 1 and 2): 


(оу 


T E а-а-а 
t t 


18 
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(5) 
23, 18 = аз — Саз = 3272 
t 
Xu) 
а = Ух e ) = ds — си = 9255 
(2) (х) 
5 (Yuzu) = 5 (YuXi) — MENS DM = ам — Cu = 1211 
2 
(2 Y=) (È х) 
b (Yaza) = 5 (У.Х) — 2 18 : = oy — C24 = 906 
t 
(у«) (È Xx) 
> (умты) = b: (YaXxu) — ы ды. des = ам — си = 2243 
t t 
From Part I, Step 2, we get 
2 yt, = 498 
У №, = 364 


t 


2 yh = 684 


Then, we have 


[> (Yuzu) Т 


Р ап "in ы 
H ті, ! 
> (Yuta) (906)? 
M; = = “5070 251 
2, 25, | 
(Yata) 
М, = 2 7а а (ау = 544 


Define 


Adjusted Y y% = У фи = Вам)? 
t t 
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2 (уыл at) 


> 12, 
4 
By simple algebraic operation, we have 


(унты) |. 
Adjusted Y №, = Ж - 2 | к | = 2 %- М, 
t t 2 Ter t 


t 


where b, = 


Define 

01 = adjusted ) y% 
2 

Therefore, we have 


a= yi,— М, = 170 
5 

й = Y yi — М, = 118 
5 

[1 = yh = Ms; = 140 
2 


Step 2. Use the L;-criterion to test the hypothesis H1:6,.4, = су. 
The calculations involved are summarized in Table 75. 


TABLE 75 
Li-CALCULATIONS FOR Hy'toy-2, = Cys 


n, log n. log 0,” 


16 18 2.2305 
16 18 2.0531 

16 18 2.1461 m 
48 54 n, log n, = 67.7862 5» log 0/ = 115.7346 


1 


1 
log Li = log N — ғ), ns log n. x? ns log 0, — log (7 „) 
n n n 


log 54 — dy (67.7862) + d; (115.7346) — log 423 
1.7324 — 1.2553 + 2.1432 — 2.6263 = 9.9940 — 10 
.986 


Nayer's tables of Lı with k = 3 and degrees of freedom f — 16. We 
have P > .05. "Therefore, we accept Н! and combine the results. 


Паи 


А 


Refer to 


Step 3. Calculate the following values for the analysis of variance of 


y and z and the covariance of yx (with x held constant). The sums of 
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squares and of products for the different sources of variation are (see 
Part I, Step 1): 
Ху" = ар — сі = 1546 = А, 
Ex? = аз — co = 16,994 = Bo 
Хуш = a, — cy = 4360 = Do 
Xy = cı — dı = 61 = А; 
(2) Between grades: ү Ex? = c» — d: = 5837 = Bi 
Dyx = c, — d = 594 = Di 
zy? ax ба di = 1607 = A 
(3) Total: | Ex? = аз — d: = 22,831 = В 
Уух = аа — di = 4954 = D 
Step 4. Calculate bSyx for “within” and “total” where Б Хул 
(Хул)? 
Ха? 
Refer to Step 3; we һауе 


(1) Within grades: 


І 


(1) Within grades: bZyz = Те 1119 = Mo 
(2) Total: Хуш = T = 1075 = М 


Step 5. Calculate adjusted Xy? for “within” and “total,” and 
reduced Sy? for “between.” 

(1) Within grades: Adjusted Ey? = Ao — Mo = 427 = Po 

(2) Total: Adjusted Sy? = A — M = 532 = 

(3) Between grades: Reduced Ху" = Р — Р, = 105 

Step 6. Analysis of variance and covariance to test the hypothesis 
Hi: ¥, Y with X held constant. The results are summarized in 
Table 76. 


TABLE 76 
ANALYSIS оғ VARIANCE AND COVARIANCE or FINAL SCORE WITH MENTAL AGE HELD 
CONSTANT 


Adjusted or reduced 


Source of 


variation 
5.8. |M.S.| F | Hypothesis 


Within grades ЖА FP acsi 
etween grades 105 |52.506.148| Rejected 


Total 


Refer to Snedecor's tables of F with n; = 2 and n» = 50. We have P < .01. 
Therefore, we reject the hypothesis Но! and conclude that there are significant differ- 
ences among the means of final scores for these three grades with the effects of mental 


age partialed out. 
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Part III 


[Снар. XI 


Complete procedures for the analysis of variance and covariance with 


two independent variables. 


Step 1. Calculate the following values (see Part I, Steps 1 and 2): 


qe. 
Уа zh }, Zi — ) аџз — Сіз = 584 


ge 
la-la-- Ж E ОВИ 


б) 


а = Ол, 18 -аз- Са = 524 
4 t 


(15) (а) 


t 


D (Yaza) = 5 (Уға) — — 18 : = аз 
t 


t 


| Q2 


t 


> (222) = юу (Хмба) — 


t 


Gx) Gz) 
(2х) (2) 


2 (yuzu) = 2 (YuZu) - هه‎ = = dis — Cis = 492 


3 (Yaza) = 2 (YaZse) : i8 : = аз — cas = 582 
У (игш) = 2 (XuZu) — E = ав — Ci; = 1190 


= Q26 — C26 = 751 


7 (хз2и) = b (Хм) : 18 азо — Cao = 1961 
t 


From Part I, Step 2, and Part II, Step 1, we also have 
Ум = 498 

t 

Уж = 364 

t 

2 y& = 684 

t 


Уз = 4467 © 


t 
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ya = 3272 
Е 


У з, = 9255 

t 

У ба) = 1211 
t 

> (Yaza) = 906 
t 


2 (Yaza) = 2243 
t 


Then, we have 


» 21, [X Qna) |’ ж. 2 (жиги) » (Yuzi) ) (Yuti) 
A ud м 1 
g% 2 4, — p inj] 


. 534(1211)° — 1190(492) (1211) 

= 4467(534) — (1190)? 

_ 783,122,214 — 709,016,280 _ 74,105,934 _ 7, 
969,278 969,278 


5 2% [7 (vata) | = 2 (тағ) 2 (Yaza) Fi (Yuta) 
t t t t t 

Yoh >, oe — [), aan) | 

t t t 


281(906)? — 751(301)(906) 
3272(281) — (751)? 
2 280,654,916 — 204,802,206 _ 25,852,710 _ 7, 


355,431 355,431 


> È (унлы) | = ) (тиги) 2 (Yaza) 3 (учти) 
M=+ D t D 
Jaza- tonal 


_ 524(2243)° — 1961 (582) (2243) 
9255(524) — (1961)? 
. 2,636,269,676 — 2,559,940,386 _ 76,329,290 


1,004,099 = 1004099 = 76 


2 ті, [> бла) | = ) (жиги) I (yur) 2 (Yuzu) 
a= J t t t t 
2 21, 2 2, — |2 (киги) | 
і t t 


— 4467(492)? — 709,016,280 
969,278 
_ 1,081,299,888 — 709,016,280 _ 372,283,608 
969,278 969,278 


Mi = 


a 


— 884 
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ра в] Y ae X Уны 
i E t 4 i i 
Ni 2 тї, 2 22, — [> (таға) | 
t t f 


3272(301)? — 204,802,206 


355,431 
297,431,344 — 204,802,206 _ 92,629,138 _ 94, 
355,431 2 855,31 


Уз [> (ага) | x 2 (тиги) у (Yata) 2 (Yaza) 
Ий к= жамы —— —  — BAM — 
= 2 23, 5 zà — E (zaza) | 
t t t 


_ 9255(582)? — 2,559,940,386 


1,004,099 
3,134,890,620 — 2,559,940,386 _ 574,950,234 .., 
1,004,099 1,004,099 


Define 
Adjusted ) уу = (Yor — Баха — Darter)? 
5 2 


b 2) (Yatta) — 2 (сиға) 5 (уны) 
"her zs 2 5 ٤ 
where Der у 9 2, D (лаға) | 


t t t 


5 Ay (улга) = 2 (жыгы) 2 (у) 
Ёа = & t t D 


у [5 need | 


t 


By troublesome algebraic operation, we have 
Adjusted У y& = Ол-ш-М 
t t 
Define 6 = adjusted 2 Vs 
t 


Therefore, we have 


of’ = у — M - М = 38 
t 

e = Уж „= Мұ — № = 30 
t 

ay = Ум MA М = 35 
t 


[Снлр. XI 


Снар. XI] APPLICATIONS OF THE ANALYSIS 259 


Step 2. Use the Li-criterion to test the hypothesis НҮ :т а, = oy... 
The calculations involved are summarized in Table 77. 


TABLE 77 
Li-CarcULATIONS FOR Hy!’ t¢yy-2125 = Oyz: 


Ja Ns log n. n, log n, B” log 0," n, log 6,” 
5 18 38 1.5798 
15 18 30 1.4771 
15 18 35 1.5441 
45 54 2 n, log n, = 67.7862 103 | У x, log 6,” = 82.8180 
м 
8 8 


1 1 
log Li = log N — a), n, log n, + х у n, log 0,” — log г) 
8 8 
(67.7862) + 2: (82.8180) — log 106 
53 + 1.5337 — 2.0128 = 9.9980 — 10 
Li 
Refer to Nayer's tables of Lı with k = 3 and degrees of freedom f = 15. We 
have P > .05. "Therefore, we accept Н,” and combine the results. 
Step 3. Calculate the necessary values for the analysis of variance of 
х, y, and z and covariance of yx, yz, and zz (with both x and z held 
constant). The sums of squares and of products for the different sources 
of variation are (see Part I, Step 1 and Part II, Step 3): 


Dy? = 1546 = Ao 
Xx? = 16,994 = Bo 
C" Xe? = аз — сз = 1339 = Co 
(1) Within grades Syn = 4360 = D, 
Хуг = as — cs = 1875 = Eo 
Daz = ав — св = 3942 = Fo 


= وم‎ — d; = 84 = С, 
(2) Between grades же 504 >р 
Xyz = cs — ds = 71 = By 
Dez = cs — ds = 697 = Fi 


ху? = 1607 = А 

Ха? = 22,831 = B 

Z2 = аз — d; = 1423 = С 
Xyr = 4954 = D 

Хуг = as — ds = 1446 = E 
Ezz = as — ds = 4639 = F 


(3) Total 
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Step 4. Calculate biZyr and Хуг for “within” and, “total,” 
2°( Lyx)? Zaz уг Lyx 
where ыў ух S2327 — (Ула)? 
®хт°?( Lyz)* Laz Zyr Zyz 
D 02 = лї? — (Ула)? 


Refer to Step 3. We have 


с. — РЕГ 
BoCo — F$ 
ВЫ —FoDEBo ea ei 
ba y ye UB f 1178 = № 


= 252 = М 


(1) Within grades: bi 2 ух 


Total: bi) yx = TL = 154 = ۲ 
BE? — FDE Өк” 
be) ye = ро рт = 1323 = № 


Step 5. Calculate adjusted Ху? for “within” and “total,” and 
reduced Zy? for ‘‘between.” 

(1) Within grades: adjusted Sy? = Ao — Mj — № = 116 = P$ 

(2) Total: adjusted Zy? = A — M — N = 130 = Р! 

(3) Between grades: reduced Zy? = P! — Pj = 14 

Step 6. Analysis of variance and covariance to test the hypothesis 
HU:Y, = Y with both X and Z held constant. The results are summa- 
rized in Table 78. 


TABLE 78 
ANALYSIS OF VARIANCE AND COVARIANCE OF FINAL Score WITH BOTH MENTAL Аав 
AND INITIAL Score HELD CONSTANT 


Adjusted or reduced 


Source of | D.- 

D. ху? | Zz? | Z2? | Zyz | Хуг | 222 
variation | Е. p.-| S- | M.- Hypoth- 
Е 


esis 


Within 
grades | 51 


Between 
grades 2 61| 5,837} 84 594) 71) 697| 2| 14/7.00|2.95| Accepted 


Total| 53 | 1,607, 22,831 1,423) 4,954 1,446) 4,639) 51 


1,546] 16,994] 1,339) 4,360) 1,375) 3,942) 49 | 116)2.37|....|......... 


sof F with n; = 2 and n; = 49. We have P > .05. So 
” and conclude that there are no significant differences 
for these three grades with both the effects of mental 


Refer to Snedecor's table: 
we accept the hypothesis Ho 
among the means of final scores 
age and initial score partialed out. 


Analysis of Variance in the Case of Unequal or Disproportionate 
Numbers of Observations inthe Subclasses. The analysis of variance in 
the case of a single criterion of classification with unequal numbers in the 


Cuar. ХІ] APPLICATIONS OF THE ANALYSIS 261 


subclasses introduces no new difficulty as has been indicated in Problem 
XI.2. However, when our data have been classified on the basis of two 
or more criteria with unequal subclass numbers, new difficulties arise. 

In agriculture and in other experimental sciences it is usually possible 
to design an experiment so that each subclass has always the same number 
of individuals. If this were a necessary condition, the use of the powerful 
tool of analysis of variance would be greatly restricted, since there are 
fields, such as those dealing with human beings—education and psy- 
chology, for instance—where unequal representation in each cell of 
multiple classification of data is of common occurrence, both in experi- 
mentation and in other observational programs, including data collected 
by governmental and state agencies. There is an urgent need, there- 
fore, for a systematic formulation of methods of attacking problems when 
unequal representation in the subclasses occurs. Methods have been 
developed for such problems (Refs. 8, 9, 10). 

Tsao (Ref. 10) has treated the problem of analysis of variance and 
covariance for unequal or disproportionate representation in the sub- 
classes by giving the mathematical solution with the specified restric- 
tions defined and by proposing new approximate methods with the 
respective statistical assumptions to be fulfilled. Our consideration of 
this problem is limited to the presentation of an approximation method of 
analysis for unequal representation in the subclasses of two classifications. 

Problem XI.6. An approximation method of analysis of variance for 
unequal frequencies in the subclasses of two classifications. We take 
the problem of testing two hypotheses: (1) that the grade means on a 
speed of reading test are equal and (2) that the school means on the 
reading test are equal. The basic data for the fifth, sixth, seventh, and 
eighth grades in each of two schools are given in Table 79, including the 
appropriate notations. The complete analysis of the problem follows. 


TABLE 79 
CALCULATED Measures ror SPEED Score IN GATES Reapina-Survey TEST 


School Grade ға = } (Ха — Ru)? 

5 6280 
6 4835 

A 7 3094 
8 3925 
5 3888 
6 3157 

B 7 3300 
8 3002 ) 
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where s = 1, 2, 3, 4 represent grades 5, 6, 7, and 8, respectively 

i — 1, 2 represent schools A and B, respectively 

t 1,2,* Fey nai 

тө is the number of observations for the sth grade in the ith school 

Ха is the mean score for the sth grade in the ith school 

s',; is the unbiased estimate of standard deviation for the sth grade in the ith 
school 

Хы and s',; are obtained by the following definitions: 


ры 
Xa 
zh (Хш - Xai)? 
й D» 


"= пі та — 1 


Step 1. Use criterion Lı to test the hypothesis 71:0; = c. 
Let us define: 
%- 2 the = Y hae = Ea) 


The calculations for Lı are summarized in Table 80. 


TABLE 80 
Li-CALCULATIONS FOR Hiton =o 


м та log n.i та log msi Osi log 0%; nsi log 0%; 
| 

40 41 1.6128 йені | 6,280 3.7980 

38 39 1.5911 — 4,835 3.6844 

31 32 1.5051 M 3,094 3.4905 

35 36 1.5563 mm | 3,925 3.5938 

25 26 1.4150 3,888 3.5897 

26 27 1.4314 niu: | 8,157 3.4993 

33 34 1.5315 , rom 3,300 3.5185 

31 32 1.5051 E | 3,002 8.4774 ТЕТІ 

N = 267 Уы log "ы = 408.0397 31,481 4 nsi log 0%; = 959.2015 
з si 


و E, L a‏ اا ج 
The harmonic mean of fsi‏ 


8 
ішіде +++ + 
8 a 
“жала ~ 0100 


log Lı = log N — < vd na log nsi + т » та log 6; — log 2 6; 


= log 267 — 4$2(408.0397) + 34(959.2015) — log 31,481 
= 94265 — 1.5282 + 3.5936 — 4.4980 = 9.9939 — 10 
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Therefore, Lı = .986. Refer to Nayer’s tables of Lı (Table V, Appendix) 
with А = 8 and degrees of freedom f = 31.60. We find that Р > .05. 
Therefore, we accept Hi. We may assume that the eight groups have 
а common variance, and combine the results. 

Step 2. Use the x?-criterion to test the goodness of fit for the equal 
frequencies in each subclass. First calculate the mean frequency: 


v zx 
й = н = = = 33.375. The results of the x? test are summarized in 
Table 81. 
TABLE $1 
CALCULATION OF x? 
š fo — fed? 

fo fe lf —Л| (fo — 70° het 

41 33.375 7.625 58.140625 1.7420 

39 33.375 5.625 31.640625 0.9480 

32 33.375 1.375 1.890625 0.0566 

36 33.375 2.625 6.890625 0.2065 

26 33.375 7.375 54.390625 1.6297 

27 33.375 6.375 40.640625 1,2177 

34 33.375 0.625 0.390625 0.0117 

32 33.375 1.375 1.890625 0.0566 

267 267.000 xo? = 5.8688 


We find 
xi = 5.8688 


Refer to x?-table with df.=7. We find .70 > P > .50. There- 
fore, we conclude that for our data the class numbers do not differ 
Significantly. It is justifiable to use the approximation method. 

Step 3. Convert Table 79 into a table with equal frequency of 33.375 


for each subclass. : 
Retaining the original estimate of the standard deviation, we have the 


following estimates of X азы 
t 


33.375 2 _ 33.375 E 
Ў а, = 38279 (6280) = 5112 а, = TS (3888) = 4991 
3.375 

Lon = 33.375 (4995) = 4138 > ata = с. (3157) = 3902 

t 39 t ( 
375 „ _ 33.375 Е 

Ў а, Е Bee (3094) = 3227 DES = "RI (3300) = 3239 
375 33.375 

Ў а, = == (3925) = 3639 Ў hu = 25579 (3002) = 3181 
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These results, together with the data in Table 79, are summarized 
in Table 82. Тһе notations are the same as in Table 79. 


TABLE 82 
EXPECTED MEASURES FOR SPEED Score IN Gates READING SURVEY 
S$ 
School n Xa Si y Trait 


ONAN] oI ب‎ 


Step 4. Calculate the different kinds of mean scores. At least 
6 decimal places should be carried out, if possible. The different kinds 
of mean scores are given in Table 83. 


TABLE 83 
DIFFERENT Kinps or MEAN Scores 


N 1 2 3 4 5. 


1 49.68 - 41.08 42.41 53.25 46.605 
2 33.92 29.22 32.50 40.53 34.0425 
pom 41.800 35.150 37.455 46.890 40.32375 = X.. 


т 


Step 5. Calculate the following values: 
NX?. = 267(40.32375)? = 434,143, where N = 8n 
2n 2 X2. = 66.75[(41.800)? + - - - + (46.890)?] = 439,503 


& 
ll 


^ 
І 


d — 4n » X3, = 133.5[(46.605)? + (34.0425)?] = 444,678 


^ 
| 


= Э») (2, = 33.375[(49.68)° + <<. + (40.53) = 450,324 


Step 6. Calculate the sum of squares for the different sources of 


variation: 
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Tithi « % (5 bra = m 
(1) Within subclasses: X 5 1 ai = (5112 + + 3131) = 31,379 


[Refer to Table 82, column (5)] 
(2) Interactions: m 2 i X5 — 2n,X;. — 4n 2 х4 ХА? -е-с 
Ж; Н 
—d+a = 286 
(3) Between grades: 2n А X: — МХ? = c — а = 5360 
n 


(4) Between schools: шаў V? — NX?, = d — a = 10,535 


8 

(5) Total (1) + (2) + (3) + (4) = 47,560 

Step 7. Analysis of variance to test different hypotheses. First, we 
wish to test the hypothesis H: Žun — Хз = Xu — Xn = Xn — Xs 
= Хи — Ха) or that there is no interaction between grade and school. 
The results are summarized in Table 84. It is noted that if we have p 
grades and q schools, then the degrees of freedom for each source of 
Variation are as follows: 


Within subclasses N — рд 
Interaction (p — 4-1 
Between grades p-1 
Between schools q—1 

Total М-і 


The additive property of degrees of freedom is clearly demonstrated. 
From the results in Table 84, we may accept the hypothesis that the 
interaction is not significantly different from zero. Therefore, we may 
pool the sum of squares due to "interaction" with “within” sum of 
Squares, as well as the degrees of freedom. We may call this sum “‘resid- 
ual”; it can be used as the basis of testing the other hypothesis. (Note: 
If the interaction is significant, we do not pool it with “within.”) Next, 
We wish to test the other two hypotheses, namely, Hj:X1 = Х = Xs 
= X, and H{:X.ı = X.» The first hypothesis is that there is no 
difference between the four grade means. The second hypothesis is 
that there is no difference between the two school means. 'The results 


are summarized in Table 85. е 


TABLE 84 
Rzsurrs or TESTING THE HyroTHEsIS Hı 
Source of variation DE: . 8&8. M.S. Hypothesis 
Within subclasses 259 31,379 ЛЕ | sunaa 
Interaction 3 286 95.33 Accepted 


------2-- ا 
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TABLE 85 
ANALYSIS OF VARIANCE FOR SPEED Score ім GATES READING SURVEY 
Source of variation D.F. 5.6. M.S. F Hypothesis 
Residual 262 31,665 120.86) | .— | genes ye 
Between grades 3 5,360 1,786.66 14.78 Rojected 
Between schools 1 10,535 10,535.00 87.17 Rejected 
Total 266 


From results in Table 85, we reject both the hypotheses Н and Ну. 
'Therefore, we conclude that there are significant differences between the 
means of the grades and that there is also a significant difference between 
the means of the schools. 

PROBLEMS 
1. Is there a significant difference among the means of reaction times for 
age and for sex? 


Reaction Times IN SECONDS To LIGHT AND SOUND or VARIOUS AGE Groups (4-60 
YEARS) ACCORDING TO SEX 


Male Female 
Age Light Sound Light Sound 
group | у |__ | N 
Mean 8.р.* | Mean S.D. Mean 8.р.* | Mean §.D. 


10 .94  .1070 .94  .0928 | 10 .62 .1644 .59 .1890 
10 .24 — .0400 -23 .0409 | 10 .32 .0840 E .0407 
10 +22 .0881 19 .0338 | 10 .26 .0192 .20 .0786 
4 :0465 .24  .0141 | 10 .94 .0378 180 .1189 
10 .27 .0266 -25 .0467 | 10 .96 .0342 .80 .0872 
10 .98 .0574 87 .0806 | 10 -44 10721 .42 .0842 


"Hgou» 
© 
t2 
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* Standard deviation, Pearsonian. 
2. Give a complete analysis of variance for the following data: 


REPORTED Tests WITH STANFORD ACHIEVEMENT Test BATTERY ім 1924 (Data 
FROM BALDWIN) 


Number of А 5 

Age enses Mean Unbiased S.D. 
9 100 27.4 10.18 
10 117 37.9 11.63 
Boys 11 96 44.2 12.95 
9 115 29.1 10.76 
i 10 126 38.3 10.52 
Gii 11 87 44.2 11.04 
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3. Test the significance of the difference between the means of students in 
arithmetic computation in the different types of schools, Grade 4. 


ARITHMETIC Computation Scores BY Type or SCHOOL 
(After Peterson, 1948) 


Soa aaa a LED 
Frequency 
Score Я 
interval Total 
Boarding Day Mission Non-res. Public 
55-59 0 1 0 0 0 1 
50-54 1 1 0 0 0 2 
45-49 4 3 0 1 1 9 
40-44 4 10 1 0 15 30 
35-39 41 60 29 8 90 228 
30-34 84 146 48 17 231 526 
25-29 80 148 31 17 222 498 
20-24 69 166 30 16 129 410 
15-19 75 165 24 12 123 399 
10-14 48 130 11 8 62 259 
5- 9 37 98 8 6 47 196 
al m O > 98. 0 - 
Total 454 964 185 90 939 2632 
ا ا‎ 


4. The data on the following page were obtained from the administration 
of two tests to a random sample of 132 students in a class in college 
biology. Test 1 was designed to measure the acquisition of funda- 
mental facts and principles; Test 2, to measure the ability to apply 
a knowledge of facts and principles. 


Problem: Test the linearity of regression of scores in Test 2 on scores 
in Test 1. 
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Siudent Score on Student Score on Student Score on 

No. Test 1 Test 2 No. Test 1 Test 2 No. Test 1 Test 2 
1 63 34 45 63 34 89 53 27 
2 71 42 46 83 44 90 49 22 
3 70 41 47 80 52 91 90 49 
4 119 50 48 89 49 92 69 31 
5 109 57 49 98 44 93 52 37 
6 75 30 50 73 35 94 40 41 
7 88 33 51 65 30 95 82 40 
8 83 55 52 62 30 96 90 37 
9 68 20 53 114 54 97 108 54 
10 59 35 54 105 39 98 83 40 
11 55 43 55 88 35 99 98 37 
12 106 47 56 78 49 100 61 18 
13 56 35 57 69 51 101 80 39 
14 81 51 58 67 36 102 70 40 
15 102 48 59 79 29 103 60 30 
16 94 43 60 80 38 104 66 34 
17 97 40 61 47 36 105 71 31 
18 84 39 62 68 42 106 85 46 
19 91 51 63 93 44 107 43 26 
20 85 41 64 78 37 108 65 32 
21 106 49 65 51 34 109 53 35 
22 86 49 66 92 46 110 88 45 
23 104 41 67 76 36 111 68 41 
24 78 40 68 105 57 112 93 46 
25 91 51 69 55 32 113 91 47 
26 82 43 70 86 50 114 101 56 
27 64 34 71 71 30 115 94 40 
28 55 38 72 70 31 116 91 41 
29 87 40 73 68 28 117 73 33 
30 50 30 74 81 39 118 99 47 
31 75 40 75 81 48 119 99 45 
32 73 41 776 65 39 120 66 40 
33 59 43 77 104 49 121 78 40 
34 91 48 78 88 43 122 56 37 
35 80 52 79 78 32 123 93 48 
36 105 59 80 84 40 124 85 38 
at 21 55 81 92 ат 125 58 36 
38 т 39 82 84 85 126 92 43 
39 124 52 83 78 48 127 75 31 
40 68 34 84 66 25 128 66 27 
4l 100 49 85 592 зз 129 69 44 
42 81 34 86 52 39 130 111 50 
43 69 44 87 61 38 131 73 35 
44 78 40 88 96 43 132 73 41 
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5. Analyze the following data obtained for Indian students in the 
twelfth grade showing scores on an arithmetic test and the number of 
schools attended (after Peterson, 1948): \ 


Number of schools attended 


Arithmetic 
comp. score 

1 2 3 4 Over 4 
95-100 0 1 0 0 0 
90- 94 0 0 0 0 0 
85- 89 0 0 0 0 0 
80- 84 0 0 0 0 0 
75- 79 2 2 2 4 0 
70- 74 18 16 13 16 9 
65- 69 12 39 20 21 20 
60- 64 15 56 30 21 8 
55- 59 14 48 23 18 18 
50- 54 8 46 22 13 13 
45- 49 2 30 21 11 8 
40- 44 2 16 19 13 6 
35- 39 3 17 9 5 3 
30- 34 1 5 7 T 0 
25- 29 0 7 3 0 0 
20- 24 0 3 1 0 0 
15- 19 1 2 1 0 0 
10- 14 0 0 0 0 0 
5- 9 0 1 1 1 0 
0- 4 0 1 0 0 0 
Total 78 290 172 130 85 


6. Test the significance of the difference between the means on the achieve- 
ment test of the experimental and control groups after adjustment has 
been made for any inequalities in the two groups with respect to pretest 
and I.Q. scores. The data on the following pages derive from an 
experiment to evaluate the effectiveness of the school excursion in teach- 
ing a unit on Communication in the sixth grade in eight elementary 


Schools (Clark, 1938). 
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Primary DATA ror SCHOOLS COMPRISING CONTROL GROUPS 
Pretest Final test 
Individual C.A. M.A. зеге score 

1 1:3 117 28 31 

2 146 146 34 50 

3 148 129 40 44 

4 142 142 41 51 

5 152 137 32 39 

6 143 138 40 53 

7 141 140 29 38 

8 157 145 37 39 

9 139 142 32 47 
10 141 152 32 49 
i 145 158 47 57 
12 143 144 38 47 
13 146 158 39 50 
14 144 130 28 44 
15 143 169 36 49 
16 147 155 39 48 
17 143 158 51 58 
18 148 124 15 26 
19 151 149 46 53 
20 138 172 51 56 
21 140 147 34 50 
22 143 146 35 40 
23 146 140 29 37 
24 161 147 24 31 
25 142 156 39 46 
26 145 171 44 58 
27 147 141 32 49 
28 134 145 34 52 
29 151 141 33 45 
i 146 148 39 43 
21 153 161 29 42 
52 146 141 29 42 
5 175 142 28 48 
ok 144 145 34 39 
35 149 144 32 51 
36 150 127 24 33 
37 144 125 22 29 
38 158 134 24 40 
39 134 157 55 56 
40 142 149 27 37 
41 140 140 41 52 
42 145 146 43 46 
43 146 134 37 46 
44 145 152 38 45 
45 144 150 54 59 
46 157 147 35 45 
47 154 145 36 52 
48 139 125 16 33 
49 145 135 29 38 
50 143 150 37 48 
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Primary Dara ron Ѕснооіѕ COMPRISING CONTROL GROUPS (Continued) 


D, "ina 
Individual C.A. M.A. ү ш 
51 171 124 11 25 
52 151 154 35 39 
53 163 149 36 44 
54 146 148 25 39 
55 139 155 35 50 
56 149 137 27 35 
57 142 128 14 25 
58 90 147 33 36 
59 140 154 27 38 
60 141 143 28 40 
61 146 128 8 23 
62 146 146 40 56 
63 148 152 49 59 
64 142 136 29 44 
65 160 140 35 40 
66 142 154 23 40 
67 145 157 41 55 
68 146 159 43 57 
69 143 136 36 48 
70 146 139 40 37 
71 143 145 21 38 
72 144 134 40 47 
73 145 161 37 45 
74 146 146 27 34 
75 145 148 38 51 
76 155 160 29 42 
77 143 145 32 49 
78 156 126 17 31 
79 139 142 23 84 
80 146 152 30 43 
81 141 166 26 38 
82 134 155 33 44 
83 142 146 34 39 
84 137 151 35 46 
85 140 142 26 42 
86 138 161 37 48 
87 150 142 28 38 
88 143 155 38 41 
89 143 137 19 34 
90. 143 151 27 40 
91 143 150 25 40 
92 143 146 42 51 
93 162 131 34 51 
94 154 143 43 57 
95 158 132 38 48 
96 144 149 42 55 
97 148 138 35 45 
98 145 149 49 67 
99 145 147 40 52 
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Primary Data ror Scuoons COMPRISING CONTROL GROUPS (Continued) 


Individual С.А. М.А. Fc rios 
101 ` 142 146 32 49 
102 141 144 49 62 
103 145 141 37 54 
104 144 141 46 61 
105 139 132 28 35 
106 143 139 38 56 
107 143 150 40 58 
108 143 149 36 50 
109 142 135 26 33 
110 144 145 24 48 
111 148 157 45 52 
112 139 147 42 52 

-113 151 124 35 47 
114 141 129 49 50 
115 150 134 38 39 
116 145 142 42 53 
117 147 141 41 56 
118 142 142 38 47 
119 162 151 34 48 
120 184 134 23 44 
121 151 4 126 51 64 
122 140 138 37 Evi 
123 141 141 33 44 
124 148 134 22 32 
125 145 164 47 58 
126 164 126 26 38 
127 141 147 42 58 
128 144 152 36 34 
129 149 137 24 42 
130 140 157 47 61 


131 133 121 88 d 52 
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Primary Data ror Scnoors COMPRISING EXPERIMENTAL GROUPS 


жыз 4 Pretest Final test 
Individual С.А. М.А. БОТО score - 
1 145 145 29 48 
2 155 154 2/0 , 41 51 
3 137 159 47 62 
4 138 148 41 54 
5 142 156 EE 62 
6 148 169 41 57 
7 148 * 163 49 62 
8 144 146 52 65 
9 147 150 47 59 
А 10 145 ^ 149 40 54 
n 146 Bo» d 41 51 
12 140 я 2 42 55 
18 147 қ 146 41 51 
14 146 153 44 57 
15 * 143 154 27 42 
16 140 153 29 44 
17 142 140 29 42 
18 142 156 39 49 
19 139 152 41 56 
20 142 149 37 50 
21 141 183 M 24 39 
22 138 151 35 51 
23 144 142 26 2 
24 134 151 38 50 
25 142 154 43 58 
26 143 138 28 50 
27 141 144 35. 55 
28 141 151 32 53 
29 146 153 32 42 
30 137 150 47 57 
31 135 158 38 52 
32 137 163 44 53 
33 137 160 45 60 
34 148 143 28 45 
35 150 142 38 49, 
86 140 156 52 63 
37 127 174 45 59 
е 38 141 143 86 57 
39 143 155 41 §1 
е 40 139 159 45 58 
4 41 142 35 49 
p 42 He 137 39 52 
43 145 146 39 50 
44 138 146 44 m 57 
45 140 140 36 53 
El ب ق ا‎ -—-—— I 
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PRIMARY Data ror SCHOOLS COMPRISING EXPERIMENTAL Groups (Continued) 


Individual С.А. М.А. pd mos test 
46 141 149 36 48 
4T 153 150 27 46 
48 140 156 33 40 
49 145 166 44 63 
50 137 169 20 45 
51 146 159 38 52 
52 145 130 30 44 
53 140 159 32 45 
54 141 155 43 57 
55 156 140 31 52 
56 136 149 37 58 
57 140 152 33 57 
58 143 138 30 44 
59 145 145 32 46 
60 145 140 38 55 
61 140 160 50 68 
62 146 122 23 41 
63 140 147 36 50 
64 139 162 47 59 
65 147 143 37 52 
66 143 147 42 61 
67 141 137 34 46 
68 145 143 36 49 
69 137 142 34 50 
70 157 120 17 31 
71 139 152 41 48 
72 146 141 25 43 
73 146 137 18 29 
74 139 164 39 55 
75 129 136 36 46 
76 145 163 40 52 
77 139 151 15 26 
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Analyze the data in Problem 6, using the Johnson-Neyman technique 
and setting up the region of significance if it exists. Contrast this 
technique with that of analysis of variance and covariance (see Ref. 6). 


- How can the analysis of variance technique be used in problems of 


estimation, that is, in the detection and estimation of components of 
random variation associated with a composite population? (See 
Ref. 1.) 
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CHAPTER XII 
THE PRINCIPLES OF EXPERIMENTATION 


There is an increasingly general realization that a formal experiment 
is an exacting enterprise designed and carried through with meticulous 
care to answer a few definite questions. The ability to formulate pro- 
, ductive hypotheses and to design experiments to test them is the mark 
of a first-rate research worker or scientist. An understanding of the 
principles underlying modern designs is essential at every stage of an 
experiment if the primary data are to be collected in such a way as to 
provide the basis for valid inference and so as to enable the maximum 
amount of information to be elicited from them most efficiently. Perhaps 
a clearer grasp of the requirements underlying sound experimentation 
can be gained by the scientific reader through studying and examining 
designs that lead to valid conclusions. Не should apply the techniques 
to actual problems, however, since difficulties usually tend to disappear 
on such closer experience. 

The whole subject of complex experiments is undergoing rapid devel- 
opment as new possibilities of the methods and of their correct application 
become better understood. The principles of experimentation, which 
originated in agriculture, are finding increasing application in many 
fields of science. The difficulties met with in application in one field 
are not identical with those in other fields, but many are similar. The 
solutions of problems arrived at in one field are often of material help in 
another. Where fields differ fundamentally, new techniques are neces- 
sary. Such needs are discovered only in direct contact with the obstacles 
themselves. Because modifications and extensions of the principles of 
design are capable of, and will undoubtedly have, ever wider application, 
the student of modern methods and statistical analysis needs to know 
how to apply these principles and how to read intelligently the reports of 
research workers who have used them. 

Modern ideas of experimental design differ sharply from earlier or 
traditional ones. It has long been an admonition in philosophical 
treatises of scientific experiment to hold constant all except one of the 
factors in a complex so that its effect may be determined. The experi- 
menter is advised to arrange an experiment so as to make it as sensitive 
as possible with respect to one question but as insensitive as possible with 
respect to all others. Just as mathematical development has been 
biased toward physies, so has the direction of experimentation been 
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largely determined by the pattern of experimentation set up in physics, 
which emphasizes the importance of varying the essential conditions 
only one at a time. The difficulty in applying such a principle, par- 
ticularly in branches of science where the data of the research worker 
are subject to all sorts of fluctuations, had long been recognized by 
critical workers. The liberation of the research worker from stereotyped 
experimentation is relatively recent. 

The problem underlying the development of procedures appropriate 
to deal with types of variable material is twofold; one aspect dealing with 
the design or logical structure of the experiment, the other with the 
analysis and interpretation of the results. The development of the 
logical structure underlying the whole technique of modern experimental 
design and of the appropriate statistical tools for the analysis and interpre- 
tation of the results of such experiments is largely due to R. A. Fisher. 
Beginning his work in 1919 with the founding of the statistical laboratory 
at Rothamsted (Harpenden, England), Professor Fisher has revolution- 
ized the science of statistics and the principles of designing biological 
experiments. His principles of experimentation and methods of statis- 
tical analysis are finding increasing application in many fields of science, 
Particularly wherever the basic materials are variable. The possibilities 
of applying these principles also to the improvement of physical and 
Chemical experimentation have barely been recognized. In biophysics 
and biochemistry these principles are likely to become increasingly 
Important, 

The subject of the design of experiments is too large and too impor- 
tant to scientific workers for it to receive incidental treatment only. In 
his text The Design of Experiments Fisher presents the framework of 
Scientific inference and the principles of modern experimentation. Our 

iscussion is limited to a brief consideration of the major characteristics 
of modern experimental designs. We are especially interested in the 
"Ole which statistical procedures play in serving the requirements of 
Sound experimental design and in furnishing the means for unambiguous 
Interpretation. cC 

The Self-contained Experiment. A principle of general utility in 
Statistica] analysis is to rely upon the evidence from the data themselves 
When allowances are to be made for certain inequalities, as in certain 
comparisons under consideration. Arbitrary corrections based on an 
а Priori basis without reference to the information provided by the data 

~°™selves cannot lead to convincing conclusions. V iolations of statis- 
tical Principles of this kind, though not so obvious a misuse of statistical 
analysis as is an arbitrary selection among observational data previous 
Es Subsequent to collection, are probably the source of the political prin- 
“ple that “anything can be proved by statistics,” or of the crescendo 
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Fisher sets up the self-contained experiment as the model for the 
research worker and describes the properties which such a model must 
possess. Although progress in science may result from the better order- 
ing of the experiences we have had, it is chiefly in the collection of new 
experiences that advancement takes place. However, if these experiences 
are to afford a secure basis for bringing new knowledge into being, they 
must be planned in advance in accordance with principles that make 
such outcomes possible. Thus, experimental observations are essentially 
experiences formulated at the time of arranging for their collection. 
Experimental observations are related to existing bodies of scientific 
knowledge as new observations are carried out to test theories growing 
out of the previous collection of data. Theories in turn become modified 
and reformulated as an outcome of the new observations. But once 
an experiment has been designed and executed, its interpretation must be 
based on its own evidence. The purpose, therefore, of making an experi- 
ment self-contained is to make possible the valid and unequivocal inter- 
pretation of its results without referring for decision or settlement or 
consideration to other experiments or to the aggregate of experiences of 
prior collection. The principle that an experiment should be self-con- 
tained determines the essential difference between mere statistical 
observations and those which are collected in accordance with a clearly 
conceived plan. 

The Function of Controls. A primary requisite of the principle that 
an experiment should be self-contained is the necessity of supplying à 
control or controls, that is, the need to base all conclusions concerning 
the differential effect of two or more contrasting treatments on the 
differences in the response or reaction of two or more similar bodies of 
experimental material. By the use of controls, experiments become 
comparative and not merely absolute. Absolute information is usually 
of little interest or importance. The reasoned explanation of the func- 
tion of controls is clearly illustrated by the following example (Ref. 2). 

Assume that an experimenter working with animals injected some 
fluid into 3 rabbits and found that all 3 got violent and prolonged con- 
vulsions followed by death within an interval of 24 hours. In support 
of his conclusion that the injected substance was the cause of the death of 
the animals, the experimenter might draw from his own previous experi- 
ences or from those of rabbit breeders in general. Admittedly, only 
rarely would three designated animals die in the way described within 
such a short period of time. How would the conclusion have been 
made stronger if the experimenter had taken the precaution to inject 
a number of control rabbits with a neutral substance at the same time 
at which he injected his experimental animals? The answer to this 
question provides the rationale underlying the use of controls. It is 
that the controls are used to exclude, at a designated level of probability, 
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a number of alternative interpretations of the experimental results— 
possibilities which have individually and collectively an unknown 
probability of having occurred. For example, the rabbits might have 
been ill from tetanus, hydrophobia, cholera, or some other unsuspected 
epidemic disease; perhaps the needle was infected with a poisonous 
substance; or it might be that the experimenter’s stock was genetically 
of a kind which reacted in this way in general to injections. Suppose, 
however, that the experimental rabbits had been randomly chosen from 
the whole herd, the controls included. Then, if their reaction was 
clearly different from that of the controls, there was available a precise 
measure of probability for causes other than the experimental factor 
for having brought about the observed result. The probability is based 
exclusively on the number of rabbits used, completely independent of all 
prior experience of these animals. Assume, for instance, that 5 control 
rabbits have been selected at random from the total number and, after 
having been injected with distilled water, had not died of convulsions. 
The measure of probability is obtainable from a simple application of 
Permutations and combinations. 

There are 56 ways of choosing a group of 3 objects out of 8. If the 
3 objects were to be selected consecutively, there would be successively 
8, 7, and 6 objects to choose from and, therefore, the succession of 
choices could be made 8 X 7 X 6, or 336, ways. This number repre- 
Sents not, only every possible set of 3 but also every possible set in every 
Possible order. Three objects can be arranged in order in 3 X 2 X 1, or 
6, ways. The number of possible choices is found by dividing 336 by 6, 
Which is 56, The result, 56, is essential for the interpretation of the 
experimental results. The 56 sets of 3 which might be chosen would be 
distributed among the possible events as follows: 


Number 
Dying f 
0 10 
1 30 
2 15 
3 E. 
Total. .. .. . 5 


The Probability of the observed difference, if it were not attributable 
9 the material injected, is, therefore, 1 in 56, or а probability level of 
1018, which by the usual standards may be regarded as significant. It is 
also worth noting that the use of the controls serves to transform the 
Wality of the experimental evidence by making it strictly objective for 
Others who have not undergone the experiences of the experimenter. 
The Weight of previous or outside evidence is even much less when the 
Object of the experiment is quantitative, because such evidence is usually 
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very indefinite or highly variable. Thus, the essential condition for 
controlling the interpretation of experimental results is the provision of 
comparisons between two or more unlike variants. 

The Valid Estimate of Experimental Errors. The second requirement 
of a self-contained experiment is that it must hold within itself the 
possibility of securing a valid estimate of the experimental errors which 
really influence the comparisons made. That is, it is necessary to esti- 
mate the error from the data of the experiment itself, because it is only 
under such conditions that proper confidence can be put in the result of 
the experiment. In any experiment there are factors which are suscep- 
tible to some degree of control by the experimenter. But their effect 
cannot be entirely eliminated, owing to chance fluctuations. Many of 
the factors giving rise to these fluctuations which affect performance 
are small in size and random in incidence, so that it is impossible to 
present an exhaustive list of all the sources of variation in the experi- 
mental material. It is customary to designate the component of varia- 
tion associated with the random variation of the experimental material 
as experimental error. The errors do not follow any known exact laws, 
and so the laws of chance are usually designated as descriptive of their 
distribution. 

As was pointed out in the discussion of analysis of variance, it is 
assumed that the experimental errors to which the experimental observa- 
tions are subject shall be independently and normally distributed with 
the same variance. The importance of the experiment making possible 
a valid estimate of the experimental errors is indicated by the fact that 
only under such conditions is it possible to apply to the experimental 
results tests of their significance which are disconnected from all past 
experience and are hence capable of adding new knowledge. Therefore, 
the design of a self-contained experiment involves the consideration of 
means of affording a valid estimate of error as well as ways of making 
possible an unbiased comparison between contrasted treatments. The 
validity of other estimates of error would depend on other mathematical 
assumptions which the particular method of estimation would introduce. 
There would be no objective reason for accepting such assumptions as 
true, if the experimenter has not taken the precautions needed to make 
them true. 

Replication. The first requirement of an experiment designed so 
that a valid test of significance may be applied in its interpretation is 
replication, the process of repeating the same treatment on more than one 
object of the experimental test. The word “plot” is used in agricultural 
experimentation to indicate an individual plot or area of land. The 
“plot” could be an experimental animal or an individual, for instance. 
Replication is essential in the first place since it is a means of diminishing 
the experimental error. Just how this is done may become clear by 
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considering, first, certain factors contributing to the actual errors of the 
experiment. The amount of information which a particular experiment 
affords is known as its precision. Fisher succeeded in quantifying the 
concept of information so that now the precision is wholly a quantitative 
factor in the value of an experiment. 

There are a number of factors, both quantitative and qualitative, 
which may contribute to make the actual errors small. Some of these are 
the measurement of the criterion; the improvement in the techniques of 
controlling nonexperimental factors; care in ensuring that in the experi- 
mental material the general conditions are those occurring in population 
practice; the measurement of controls under as nearly as possible the 
Same conditions as those for the unknowns, including time; and the great- 
est possible avoidance of hidden systematic errors as well as subjective 
errors. Only when sufficient care has been given to ensure that working 
errors have been reduced to unimportant quantities can improvement 
of the replication and the organization of the structure or arrangement 
of the experiment be expected to achieve greatly increased precision or 
Sensitiveness. The process of reducing working errors begins with 
reducing the largest sources of error, and it continues until sources of 
error that hitherto seemed inconsequential become significant by limiting 
the value of the whole enterprise. 

The second function of replication in an experiment is to provide 
the data from which the appropriate estimate of experimental error can 
be calculated. Thus replication performs the double service of reducing 
eXperimental error and of furnishing an estimate of the error that remains. 

eplication is the sole source of the estimate of error. То make certain 
that the estimate of error is unbiased requires as much attention in the 

€sign of an experiment as does the guarantee that any of the direct 
estimates are without bias. Furthermore, the unbiased estimate of 
eXperimental error is fundamental for the application of valid tests of 
Significance by which the value and significance of the experiment are 

termined. Likewise, an unbiased estimate of error is a necessary 
Condition if one is to assess the weight that may be given to the evidence 
e: ап experiment should its results differ from those of other experiments 
Of the sam rt. 

Since и ны of an experiment as represented by ihe standard 
error of à mean of any one treatment increases in proportion to the square 
7006 of the number of replications, it is clearly indicated that a larger 

Шегепсе in treatments would be necessary to demonstrate the sig- 
nificant effect based on a smaller than on a larger number of replications. 
: he argument is sometimes advanced that the results are good enough 
if there is reason to believe that the estimate of error is at least not an 
“nderestimate, Fisher points out that the danger of the fallacy of 
Assuming to be « on the safe side” is that there is no security in admitting 
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a bias in either direction. The effect of overestimating the error may be 
to prevent the experimenter from drawing a conclusion which the experi- 
ment justly substantiates. Such a practice could lead to the belief that 
an effect is consequential when it is not, and so to ignore the real cause 
of disturbance in the design of subsequent experiments. Because of the 
exploratory and tentative character of much research, a promising line 
of inquiry might be given up through a failure to discern the clue which 
the experiment might otherwise have provided. 

Randomization. It is essential in an experiment to recognize that 
equalization is approximate to a greater or lesser degree, no matter how 
much care and experimental skill are exerted in attempting to equalize 
the nonexperimental conditions which are likely to influence the result. 
In many significant practical situations the attempts at equalization are 
definitely inadequate. It becomes of fundamental importance that this 
inequality shall not lead to biased estimates and invalid tests of sig- 
nificance. The essential safeguard is included in the experimental 
procedure by a process which is known as randomization. Just how this 
operation works can be explained by considering again the origin of error. 
The real errors of the experimental results originate from differences 
in the nonequalization of the nonexperimental conditions among the 
objects or groups of objects that are treated differently. The estimates 
of error are secured from the discrepancies among the objects treated 
alike. Consequently, it is necessary only to make certain that any two 
objects that may be treated alike have the same probability of being so 
treated. Likewise, if treated differently, the objects must have the same 
probability of being so treated, in each of the ways in which this is pos- 
sible. This precaution is necessary to assure that each component of 
error which may influence the experimental results may with equal 
frequency furnish the data used in the estimate of error. The calculus of 
probability and the mechanism of the statistical theory of sampling dis- 
tributions can then be applied with confidence. 

Randomization, then, is the procedure of making certain that the 
probabilities of being subjected to like treatment are equal for every 
relevant pair of objects in the experiment. It is worthy of note that the 
object of randomization is not to increase the precision of the experiment 
but only to guarantee that whatever precision the experimental arrange- 
ment is capable of providing is neither over- nor underestimated. Sys- 
tematic arrangements of plots or objects in contrast to random 
arrangement have been shown to give consistently either an over- or an 
underestimate of error. 

Controls, replication, and randomization have been discussed as 
the essential aspects of the principle that an experiment should be 
self-contained. 

Relationship between Experimental Design and Statistical Analysis. 
The relation between experimental procedure and statistical analysis 
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will now be considered more fully. It is apparent from the discussions 
of experimental design that a substantial number of the ideas or concepts 
are of a statistical nature. In fact, a clear understanding of the sta- 
tistical procedures used is an essential part of the understanding of the 
principles of experimentation. These procedures serve to fulfill the 
requirements of intelligible and accurate experimental design and to 
provide the machinery of unequivocal interpretation. We note, then, 
that the question of experimental procedure and that of statistical analy- 
sis are two aspects of the single problem—the problem of fulfilling the 
requisites of the operations involved in making additions to scientific 
knowledge by experimentation. 

An analysis of the relationship between the two aspects reveals that 
once the practical experimental procedure is established, only one method 
of statistical analysis can be valid. Furthermore, a fact of great practical 
significance is that the validity of the statistical analysis depends upon 
the introduction of a random element in the arrangement of the objects 
of the experiment. A definite and complete statement of this specific 
process of randomization followed determines in advance the correct 
statistical method to be applied to the experimental results. The logical 
organization of each of the possible types of randomization is set forth 
by the analysis of variance. The neatness of the arrangement of caleu- 
lations and of the facility of their interpretation in the analysis-of-vari- 
ance table is greatly appreciated by the modern research worker. The 
compactness and simplicity of this form of summarizing the results as 
Vell as the logical structure of the experiment have added greatly to the 
intelligibility and accuracy of its interpretation. The logical structure 
of the experiment is shown by the division of the total number of degrees 
of freedom, the independent comparisons, correspon 
Sum of squares calculated. 

The development of principles i 


ding to each of the 


mproving the art of experimentation 
has been concomitant with that resulting in tools suitable to analysis of 
experimental results. ‘The standardized methods of statistical analysis 
Were designed largely on the basis of a mathematical theory in which the 
Problems underlying experimental designs of more recent origin had not 
osen explicitly considered. It has been previously pointed | out how 
"Student's" discovery of the t-distribution and Fisher's extension to the 
?-distribution made exact tests of significance possible, both for small and 
9r large samples. The modern advances in experimental design have 
tought about an increased awareness in practical work of the numerous 
ifferent sources of variation affecting experimental and observational 
Material. Exact tests of significance and the technique of the analysis 
9! variance are indispensable in the assessment of these various compo- 
Rents of variation. 
We should not overlook the mathematical fram 
Modern tools of scientific value have been built. 


ework upon which the 
This framework gives 


284 PRINCIPLES OF EXPERIMENTATION (Снағ. XII 


precision to tests of hypotheses concerning factors giving rise to variation 
and to experiments planned to yield maximum information. 

The statistical treatment of the results of replicated experiments is 
usually established on the assumption of the normal law of error, and the 
general formulation of the analysis is drawn from the method of least 
squares. It is essential for the correct application of the method of least 
squares that any components of variation not removed by the experi- 
mental design be normally and independently distributed. If these 
conditions are not fulfilled, the theoretical basis underlying tests of sig- 
nificance breaks down and hence estimates and tests of significance are 
invalidated. Thus, in the test of significance associated with the analysis 
of variance, it was assumed that the measured effects of the factors 
under experiment were statistically independent and normally distributed 
variates, all with the same variance but with possibly different means. 
Unless, therefore, the arrangement of experiments is balanced to fulfill 
the assumptions, the statistical reduction of the data would be very 
difficult, and convincing results would be impossible. Such a balanced 
arrangement is illustrated in Equation (10.04), page 214, where the entire 
calculation is much simplified by the fact that when the equation is 
squared and the terms are summed, the cross-products become zero. 
Another significant property is that the difference between the means 
for any one factor is independent of the other factors. 

The validity of the method of least squares as the basis for the testing 
of hypotheses by experimental results was secured by Fisher through the 
introduction of randomization into the design. It has been pointed 
out that systematic arrangements are apt to lead to biased results, because 
the necessary element of randomization is lacking and hence the test of 
hypotheses through results based on the method of least squares does not 
produce the same objective validity as does a test on experimental 
observations obtained from random arrangements. 

In spite of the fact that the relation between the material conduct 
of an experiment and its statistical interpretation must be used in plan- 
ning conclusive experiments, some experimenters continue to work with 
variable material without such design and to obtain discordant results 
incapable of being fitted into a scientific system. Controversies some- 
times arise because different experimenters get diverse results for the 
same problem. In other cases, methods of statistical analysis are 
employed which result in definitely misleading estimates of error. Also, 
methods of experimentation are used which cannot give a valid test of 
experimental results. The common procedure of consulting a statistician 
or statistical principles after an experiment or investigation has been 
completed is equivalent to holding a post-mortem analysis. Perhaps 
the only interpretation of the data that can be made is to state from what 
the experiment died. But when research workers turn to sound methods 
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of statistical analysis which involve carefully planned experimental 
designs, difficulties of the type enumerated above tend to disappear. 

Therefore, we can state that the most important work of the statis- 
tician is to prepare the plan of the experiment or investigation in such a 
way as to get the best answers to the questions raised. It has been 
demonstrated that a complete overhauling of the process of collecting, or 
of the experimental design, can often increase the precision tenfold or 
twelvefold for the same expenditure in time and labor. The modern 
research worker, therefore, needs statistical knowledge not only for work- 
ing out the results but also for designing: unless he has a working knowl- 
edge of the technique he employs, he cannot conduct his experiment 
properly. In planning an experiment, it is especially important to give 
due attention to possible results and their interpretation. The experi- 
menter must be induced to use his imagination, and to anticipate the 
confusion and difficulties that will assail his investigation if they are not 
forescen. 
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CHAPTER XIII 
APPLICATIONS OF THE PRINCIPLES OF EXPERIMENTATION 


We now proceed to show the application of the principles of experi- 
mentation to certain cases of technical importance. Our emphasis is 
upon the interpretation of the experimental results and the fundamental 
part which statistical methods, particularly those of analysis of variance 
and covariance, play in this process. 

Let us take one of the simplest designs planned to compare the actions 
of two like individuals under contrasting conditions. A biologist: might 
wish to determine the effect of the removal of a deep-seated organ of an 
animal. Asa control he would perform a similar operation upon another 
animal of the same kind but in which the organ under investigation would 
not be disturbed. In this way the experimenter attempts to make the 
situations alike in all respects except the factor to be tested. Such 
perfect experimental control is an ideal desideratum which is never 
capable of complete fulfillment. It is, however, a basic principle upon 
which experimentation depends. 

The Single-Factor Experiment. The method of pairing takes into 
account two desiderata in experimental design: (1) The requirement of 
homogeneous experimental material so that the sensitivity of each 
individual observation may be enhanced, and (2) the need for multiplying 
the number of observations in order to reveal the reliability and the 
consistency of the results. The two coupled individuals would, presum- 
ably, react alike under the same treatment, and the difference observed 
under contrasting treatment measures the differential treatment effect. 
A minimum of two pairs, or replications, is required, since with a single 
pair it would be impossible'to ascribe any difference in behavior detected 
to the difference in treatments or to the particular variability of the 
individuals, or to both jointly. The differences between the measure- 
ments of the respective pair members constitute the experimental data 
upon which inferences are to be drawn. Which individual of a particular 
pair shall receive the one or the other of the two treatments is deter- 
mined by a random process. If treatments are randomly assigned, 
replication serves to equalize the effect of uncontrolled sources of varia-' 
tion. It is the variation among the several differences that is used in 
estimating experimental error. By comparing the mean difference 
attributable to the differential effect of the treatments with the standard 
error of the mean difference, the significance of the results of the experi- 
ment is to be determined. 
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We have previously examined the statistical method for reducing the 
data obtained from an experiment purporting to be of a single-factor 
type (page 75). The difference between the achievement scores of two 
individuals paired on the basis of their potential learning capacities was 
computed for each of the 25 pairs. The null hypothesis was tested that 
these differences constituted a random sampling from a population of 
Such differences distributed about a mean of zero in a normal manner. 
The criterion, t, was set up for testing the first aspect of this hypothesis. 

The method of replicated comparison of individuals, by pitting each 
individual against another individual of like kind in conditions made as 
equal as possible, is a simple and effective experimental design for testing 
the differential effect between two treatments. It is, however, limited to 
Situations where the presumed effect of a single factor can be measured 
under the controlled conditions prescribed for the validity of the method. 
In Practice, these conditions are not often present. Furthermore, it is 
usually desirable to test the effects of more than two treatments. The 
heed for broadening the scope and comprehensiveness of experimental 
inquiries has led, therefore, to the extension of replicated comparisons of 
individuals or groups of individuals to more and more complex situations. 
In this extension, the subdivision of the experimental material into rela- 
tively homogeneous series is а fundamental part of the process, as was 
observed in the paired experiment. Just as the advance in systematic 
Sampling has been made possible by utilizing prior knowledge of the 
Population sampled, so the utilization of knowledge of how to subdivide 
the experimental material profitably has played an important part in 

€ evolution of experimental design. The principle that the process of 
Subdivision can be advantageously duplicated is also operative. The 
Smallness of number or quantity of sufficiently homogeneous material 
Circumseribes the number of different treatments rather than the number 
of replications that can be incorporated into an experiment. | 

The Randomized-Block Arrangement. The experimental . design 

nown as the randomized block is a simple application of an experimental 
arrangement illustrating the principle of the subdivision of the experi- 
Mental material into relatively homogeneous series. In this arrangement 
rach treatment occurs equally frequently, more commonly once in each 

lock, and the treatments are randomly allotted to the experimental 
Units within the block. The term “block” may denote any group 
Containing the required number of experimental units. { In arranging the 
Brouping so that similar experimental units are contained in the same 
‚ОК, the aceuracy of the treatment comparisons is increased by eliminat- 
ing from them the differences due to dissimilarities among the different 

loc 5. The process of randomization guarantees that no treatment 
bias ig introduced and permits an unbiased estimate of experimental 


“Tor basic for the validity of the test of significance. 
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Consider an experiment in nutrition on the relative effect of 4 differ- 
ent treatments A, B, C, and O (no treatment), which are randomly 
applied to 4 blocks of 4 children each chosen as nearly alike as possible 
with respect to age, height, and weight at the beginning of the experi- 
ment. The arrangement is represented in the following diagram: 


Block I Block II Block III Block IV 
Children | 1 2 3 4/5 6 7 8|9 1011 12 13 14 15 16 
Treatment OBCAIACOBIBACOIOCBA 


We give the analysis for the general case where k denotes the number 


of blocks and p the number of treatments. Then the equation for the 
sums of squares is 


240d hw‏ - € ل م = - ا 
T 8 1‏ 


(1) (2) (3) 
pk 
+ 2, X—-X,—X,Xy» (18.01) 
T 
5 (4) 
where Хь is the mean of a block, X, is the mean of a treatment, and X 
ied MEM mean. The corresponding equation for the degrees of free- 


PF — 1 =e — 1) + (р —1) + (р — He — 1 02 
(1) (2) o e dile 


The following formulas are used to caleulate the sums of Squares: 


pk pk 
(1) Total: Fs is FG 1° 
5 209-5 
(tege T = grand total for all plots) 
(2) Blocks: P) (% – XH? = y =a 
1 1 p pk 
А (where Ту = total for one block) 
D 
(3) Treatments: kY (X,— Ху у) m 
| à 20-8 
Р (where Т, = total for one treatment) 
(4) Error: 2 & Be + 8 = (1) a 


(subtract blocks and treatments from total) 
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These components are then set up in the conventional analysis-of- 


variance table. 


The standard error of the experiment is 


The standard error for the mean of one treatment is 


pk 
a-i- £+ ° 
1 


in = Dp = 1) 
—" 8 
Xt Vk 
TABLE 86 


" 


(13.03) 


(13.04) 


Tue Scores or 25 PAIRS or STUDENTS SUBJECTED TO Two DIFFERENT TREATMENTS 
IN A RANDOMIZED-BLOCK ARRANGEMENT 


'Treatments Difference Sum 
Pairs 
Xi Ха Xi = Ха Xi + Xs 
(1) (2) (3) (4) (5) 

1 73 58 15 131 
2 52 37 15 89 

3 100 53 47 153 

4 60 77 -17 137 

5 75 51 24 126 

6 67 62 5 129 

7 61 55 6 116 

8 59 30 29 89 

9 33 39 =— 6 72 
10 19 16 3 35 
11 32 15 17 47 
12 27 37 —10 64 
13 68 44 24 112 
14 54 27 27 8l 
15 26 43 -17 69 
16 80 27 3 57 
17 69 53 16 122 
18 43 29 14 72 
19 23 13 10 36 
20 11 17 = 6 28 
21 26 20 6 46 
22 30 9 21 39 
23 28 ' 35 =y 63 
24 53 21 32 74 
25 23 42 —19 65 
Sum 1142 910 232 2052 

um of 

Squares 64,226 40,474 8962 200,438 
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We may further illustrate the principles of the randomized-block 
design by applying them to the experiment presented on page 75. Неге 
there are only two treatments, which are assumed to have been randomly 
assigned to the members of the respective 25 pairs. Each pair corre- 
sponds to a block, and each individual in a pair or block i 
unit. The logical structure of this type of experimenta, 
by the process of randomization carried out, is sorted o 
of variance. In this case each item is classified by two criteria; for 
example, an individual achievement score is classified by treatment and 
membership in a particular pair. The analysis is carried out as follows: 

Step 1. The measures of treatment effects for the respective members 
of the 25 pairs are given in columns (2) and (3) of Table 86. The differ- 
ence and the sum of the treatment effects are given in columns (4) and (5). 


The sum and sums of Squares are calculated and recorded in the last two 
rows, respectively. 


Step 2. Calculate the sum of Squares for differences: 


2 at, — xy — B0 - xar 
n 


в an experimental 
l design, specified 
ut by the analysis 


= 8962 — 32)" = 6809.04 


ided by 2, the number of achievement 
This is done to obtain the per-indi- 
these differences, since the variance 
20° (see page 37). The quotient of 


respective pairs 
ould have been no interaction. Thus, the 


source of measurement of the experimental error is the uncontrollable 
variation of these differences. 


Scores ім ALGEBRA OF THE 25 
Pairs or STUDENTS 


TA Sum of 
Source of variation | D.F, squares | Mean square | p Hypothesis 


—— [Codes ИНИ 
Interaction or 


experimental error 24 3,404.52 


141.855 
Between pairs 24 16,004.92 666.870 4 j 
Between treatments 1 1,076.48 1076.480 ru ашы Жы 
Total 49 20,485.92 
Step 3. 


Compute the sum of Squares from the sums: 


[2(X1 + хор 2052)? 
Уо. + Xa)? — ال عت‎ = 200438 — (2002) = 32,009.84 
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Here, as in the case of the difference, since the sum is made up of two 
achievement scores, the comparable sum of squares for pair variation is 
2(32,009.83) = 16,004.92. This value is entered in Table 87 as the 
“between” pairs source of variation. 

Step 4. The sum of squares to measure the variation assigned to 
treatment effects is obtained as follows. The mean of the two treatment 
totals is 

$ZE(Xi- X9) = $(2052) = 1026 


The two deviations are 1142 — 1026 = 116; and 910 — 1026 = —116. 
The sum of their squares is 26,912. This sum is required on a per-pair 
basis and is therefore divided by 25. The quotient is entered as the 
measure of variation due to treatment in Table 87. 

Step 5. The total sum of squares calculated independently provides 
à check on the calculations. It is given by 


X; + ХӘ 
[X ant Ў ex] 2 = = 104,700 
= 20,485.92 


(2052)? 
50 


This value is recorded in the total row of Table 87. 

Step 6. The total number of degrees of freedom is 1 less than the 
number of individual achievement scores, or 50 — 1 = 49. The 25 
differences and the 25 sums each contribute 24 degrees of freedom; the 
two treatments, 1. Thus, the additive property applies to the degrees of 
freedom as well as to the sum of squares. 

Step 7. Tests of significance can now be applied to the results 
recorded in the analysis-of-variance table. The differential effect due 
to variation in treatment is found to be F = 1076.48/141.855 = 7.58, a 
Value significant at the 5 per cent level. The table values for F corre- 
Sponding to d.f. 1 and 24 are: Fos = 4.26; Fo = 7.82. A similar 
finding was given by the t-test (page 78), where t = 2.75 and фо = 2.797 
Or d.f. = 24. This is a demonstration of the fact pointed out on page 
55, that if there is only 1 degree of freedom as in this experiment of two 
treatments, Р = 1. Thus, é? = 86.1184/11.3484 = 7.58. 

The test of significance for the differences between the means of the 
Pairs is given by F = 666.87/141.855 = 4.7. This value is significant 
at the 1 per cent level; the value of F for d.f.’s of 24 and 24 is Po = 2.66. 

€ separation of the source of variation among the pairs illustrates 
the contribution of the experimental design to the precision of the experi- 
Ment. If this source of variation had not been isolated, the variations 
among the pairs would have been included in the experimental error, 
thus substantially reducing the precision (see Table 88). Thus by using 
(Де randomized-block design in this case and putting equated individuals 
each block, the variation among pairs has been controlled and isolated. 


292 APPLICATIONS OF THE PRINCIPLES (Снар. XIII 


TABLE 88 
ANALYSIS OF VARIANCE OF THE ACHIEVEMENT-Test SCORES or THE 25 PAIRS оғ 
STUDENTS WITHOUT THE IsOLATION-OF-TREATMENT EFFECT 


ы Sum of Mean ; " s 
Source of variation D.F. squares Square F Hypothesis 
Between pairs 24 16,004.92 666.87 3.38 Rejected 
Within pairs 25 4,481.00 179.24 
Total 49 20,485.92 


An objective basis for determining the increase in precision in using 
randomized blocks as compared with the use of two groups of random 
samples of students for the experimental comparison has been given by 
Yates (Ref. 21). The calculations are as follows: 

The error variance, 141.855, is substituted for the mean square of 
error (24 D.F.) and the mean square for treatment (1 D.F.), The 
corresponding sum of squares is found by multiplying the error variance 
by the combined degrees of freedom. Thus, (141.855)(25) = 3546.375. 
This product is added to the sum of squares for “between,” 16,004.92. 
Thus, 16,004.92 + 3546.375 = 19,551.295. This sum is then divided 
by the total degrees of freedom, 49. Thus, 19,551.295/49 = 399.005. 
The efficiency of randomized blocks as compared to random sampling 
equals 399.005/141.855 = 2.81 or 281 per cent. 

Symmetrical Incomplete Randomized-Block Design. A useful modifica- 


tion of the randomized block type of arrangement is the one known 
as the symmetrical incomplete randomized-block design. In this arrange- 
ment each block contains two units only, and all possible combinations of 
the treatments, taken j i 


ally divisible into groups, 
ments all of which might 
The study of Several treatment effects on 
zins or triplets is an example. 

quai The experimental prineiple that the 
ee ote d of the experimental material may be advan- 
ageously duplicated is best ill : 

Latin square. This type of are боля соо ea ph i 


gn is similar in principle to a randomized- 
block arrangement, but in a Latin square two eross- 


once and once only in И din 
А ach row ап 
each column. Thus, the differe У in each rov 


eliminated from the experimental comparisons. 
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The appropriate process of randomization, necessary to ensure the 
validity of the test of significance applied to the experiment, consists in 
taking any square arrangement which fulfills the conditions of a Latin 
Square and rearranging either the rows or the columns, or both, at 
random, and then assigning the treatments at random. The special 
methods which have to be used to assure complete randomization can 
be carried out by using the typical "transformation sets" tabulated by 
Fisher and Yates (Ref. 8). 

The structure of a Latin-square design is illustrated in Figs. 6 and 7 
and the appropriate statistical analysis follows. 


Suit Presented 


Ф Ф 

Ф 13 
3 
9 Ф 17 
5 
o 
E v 8 

^ 12 


Total 15 5 20 10 50 
Figure 6. Record for a single-individual. Figure 7. A 4 X4 Latin square. 
esigned to test the telepathic powers of a 
Suppose that the experiment consists in 
Presenting 50 playing cards in sequence, each card being drawn at random 
from the pack and then returned. Each subject reports his guess of the 
Suit of the card drawn each time. Figure 6 is the record of a single 
Individual. His score of correct assignments is the total of the frequen- 
Cies in the diagonal cells, for example 12. No 2 cells of a set in the con- 
Ungency table are in the same row or column and no cell is common in 
e ai" 'The sets may be defined by the letters of a Latin square as in 
"lg. 7. 


More generally, let the letters 


Consider an experiment d 
large sample of individuals. 


A, B, C, D represent treatments in the 


4 x4 Latin square. The “plots” are arranged in 4 rows and 4 columns 
and there must be as many treatments as there are rows and columns. 
€ treatments are randomly assigned to the plots subject to the double 


Testriction that the treatment can occur only once in any row or column. 
Ve give the analysis for the general case where n represents the г 
of rows, columns, and treatments. The equations for the sums o 


Squares and degrees of freedom are as follows: 


ber 
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TUR мы "7 9 оо (18.05) 
+n) (®,— XR) + У Zs- %,- Š. — Š. 4 2m) 
é=1 (4) t=1j=1 (5) 


where X, and X. represent the means of rows and columns, respectively; 


115 the mean of a treatment; and Xj; is the value of the item in the ith 
row and the jth column. 


The corresponding equation for the degrees of freedom is 


n? — 1) = e Ita- Ewa (13.06) 
(1) (3) (4) (5) 


The calculations for the sums of squares are as follows: 


(1) Totals: Ў $ (X; — X)? = 9 (x? - hin 


ig 
(T — grand total of all plots) 

п | қ : 

(2) Rows: п У (X= уз VOD. _ сі 

РА on 

de — total for one row) 
(3) Columns: X= Жун VD _ Т: 
A 2-00 2 
(Т. = total for one column) 
(4) Treatments: ny (€, — B= pre TD 


ізі 


т 2 
1 n 


(Т, = total for one treatment) 
9) Erron: E A (Xi — X,— X. — 428) = (1) - (2) - 


іші -B 
The standard error in a Latin square is 
Ў, + 2X)? 
8- 
(n — 2)(n = 1) E0 
The standard error for the mean of one treatment is 
Sz, = S_ 


v (13.08) 
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Awatysis oF VARIANCE OF THE LATIN SQUARE 


Source of Sums of Tari i 
каа due D.F. squares Mean square | Variance ratio 
2 
TOWEL 25 vis (n — 1) (2) n 2; Pi 
3 
Columns..... (n — 1) (3) (n 9 1) Fs 
^ (4) 
reatments . . (n — 1) (4) (n — 1) Ps 
В (5) 
Error... кшз (n = 2)(n — 1) (5) (n — 2)(n — 1) 
= (1) 


A Greco-Latin square is formed by a pair of 
Latin squares—one written with Latin, the other with Greek, letters— 
which, when superimposed, possess the property that each Latin letter 
occurs once in each row and in each column, and each Greek letter 
appears once in each row and in each column, and with each Latin letter 


(Ref. 7). Thus: 


A Greco-Latin Square. 


Ае BB Cy 
By Ca АВ 
св Ау Ва 


The two squares are orthogonal to each other. Orthogonality is that 
Property of an experimental design which makes possible the direct 
and separate estimates of each of the several effects. From analytical 
Seometry it is recalled, for instance, that two planes, 

and a's + b'y + ce + d' = 0 


are orthogonal (perpendicular) if aa’ + bb! + cc! = 0. The principle of 
orthogonality is a basic one in modern experimental designs. 
A Latin-Square Design in Psychology. Although the Latin square 
Was originally designed in agricultural experimentation to eliminate from 
€ experimental comparisons possible differences in soil fertility among 
Plots in rows andin columns, it has found useful application in other fields. 
tis especially advantageous when the disturbing effects of two factors 
need to be eliminated from the experimental comparisons. In experi- 
ments in psychology, for example, the effect of the sequence or order 
BE the experimental factors or situations in space or in time may need 
elimination, 
" Thus, in an experiment 
ig recognition of colors W 
Ye of the subject under different degrees 


ax + by + с +d = 0 


(Ref. 9) the object was to find out the effect 
hen they were presented to the dark-adapted 
of illumination. The following 
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analysis-of-variance table reveals the skeleton of the experimental design 
and the corresponding divisions of sums of Squares and mean squares 
into the several sources of variation: 


Source of variation D.F. | Sum of squares | Mcan square 
Among orders of presentation Grows) sisa exe 3 Sso Sso 
3 
Among illumination levels (columns)....... 3 Ssr Si 
AMORE —— 3 Ss. 5% 
Experimental еггог....................... 6 Ss, би 


Total 15 Ssr 


Т р с Mean square (color) ~ 
mean square (color) 


Г- 
mean square (error) 


of the data was necessary (see page 165). 
An extension of the ex 


orm occurs. Such color-form 
Latin square. 
ent is designed and executed 


Г с ers to definite questions. Тһе worth 
of the experiment is contingent on how wisely the questions have been 


conceived and formulated. It is fundamental to understand thoroughly 
the purpose and ultimate applicability of the experiment. A big advan- 
tage for complex experiments, that is, those designed to Secure answers 
to a number of definite questions, lies in the fact that they afford results 
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of wider applicability than do simple ones. Until recently, it was 
regarded as essential that an experiment should be simple and restricted 
to answering a single question regarding the effect of a single factor. It 
is important in setting forth the plans of an experiment to answer the 
questions which prompted the research, to list all the variables that might 
conceivably influence the results. Due attention must be given to the 
possible results and their interpretation. Even after listing all the 
variables that occur to the experimenter, there are others which are not 
suspected. As many as possible of the variables need to be controlled. 
However, it is usually desired to secure comparisons under a wide range 
of conditions of certain variables. In carrying out comparisons of two 
treatments, for instance, under the same conditions, the relative efficacy 
may be accurately determined under certain fixed conditions. How- 
ever, unless these experimental conditions duplicate the practical condi- 
tions, the findings of the former may not be applicable at all to the 
latter. An average value of the ratio of the measures of the treatment 
е of conditions is usually the quantity wanted in prac- 
In experiments based on the assumption of controlling 
all factors except the one under investigation, it is often observed that 
the results change from one experiment to another of the same kind. 
The difficulty or impossibility of controlling or isolating the various 
factors involved in experimentation precluded conclusive results in most 
cases of the traditional “controlled” experiment. Furthermore, as 
Pointed out above, it is usually most important to observe the effects of 
factors in as nearly a natural setting as possible. | 

The desideratum in experimentation of observing the effects of 
Varying all the essential conditions simultaneously rather than one at a 
time attains a substantial realization in the modern methods of design 


devised to cope with this problem. A very considerable advance has 


been brought about by the factorial design in experimentation. In this 
amined are varied concurrently in all 


advantages of this type of design 


effects over a rang 
tical application. 


Possible combinations. The principal 


Over the traditional experiment planned to ex [ 
Single factor, consist in its greater efficiency and comprehensiveness. 
, 


is superiority is achieved through the fact that in a factorial experi- 
Ment, every trial contributes to the answering of every question with 
almost the same precision as though the whole experiment had been given 
Over to any one of them. In addition to measuring the effect of each 
of the single factors, the measures of the effects of the interaction of all 
Combinations of factors are made with the same precision. The latter 
Advantage is especially great, since, with separate single-factor experi- 
ments, information could not possibly be deduced concerning the inter- 
Action of the different factors. 


The investigation of the interactions, though a highly important 
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consideration, frequently was overlooked completely until appropriate 
means for the measurement of these interactions were developed. A 
third distinet advantage of factorial design is that this plan gives results 
of wider applicability than do single experiments, since the exact stand- 
ardization of experimental conditions prescribed for the traditional 
experimental design gives information only in respect to a narrowly 
restricted set of conditions. In the factorial design the ingredients may 
be varied, that is, applied at different levels, whereas in the single-factor 
experiments standardization requires that the other factors be kept 
constant. Rarely is it possible to achieve the degree of standardization 
required for conclusive results. 

A Factorial Experiment in Psychology. The principles of factorial 


design are illustrated by presenting the design and the analysis of the 
results of an experiment in psychology. 


, and 200 grams per 30 seconds 
Women constituted the experimental 
ormally sighted; two of each sex were 
e limen values were determined for each 
subject on each of the 28 rate-weight combinations. The order of presenta- 
blished in advance by the use of Fisher 
The reality of the subject’s 
mly introduced. The entire 
fter an interval of one week. 
of the eight subjects. The 
X7 X2 X 2 x 2 factorial 
weights, 2 sights, 2 sexes, 


Thus, there were 


experimental arrangement may be called a4 
design, that is, the combination of 4 rates, 7 
and 2 dates. 

The mean D.L.-value of five trials f 
weight-rate combinations for 
statistical analysis. Let us designat 
variables. The individuals 


different Weights: the wei 
denoted by 1; of 150, by 2; of 200, by 3: of 250, by 4; of 300, by 5; of 
350, by 6; and of 400, by 7. Each weight is combined with endi of the 
four rates: 50 grams per 30 seconds is denoted by a; 100, by b; 150, by c; 


ght of 100 grams is 


normally distributed with a common variance, was studied b: 
both for totals and subgroups. Within the limitations f 4 - 
tions appeared satisfied. onthe method, the SHRP 
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and 200, by d. Observations of each individual trial were obtained on 
two different dates: the first date is denoted by a, and the second by 8. 
Нерсе, we have 2X 2X 7x4x2- 224 subgroups. Furthermore, 
we have two people for each Subgroup, denoted by (1) and (2). Alto- 
gether, then, we have 448 measures of D.L.-values (see Table 89). й 

‘ Mathematically, each measure is denoted by Х.и, Which is the 
score made by the АҺ individual of the sth sex and the ith sight for the 
jth weight and the kth rate on the lth date. The mathematical expression 
of the D.L.-value of the tth individual in the sth sex of the ith sight on 
the jth weight of the kth rate at the lth date is 


X ku SA FB CE Dy By t Pet Lo Lad Ls 
Tack Tig + Int Ia + 1а + Lye + Ly, 
+ I + Dee + 1а + Lg4- Tet La (13.09) 
F Iin gla + Гы + І. + lg 
T Teint + Lgs lia + І. + Zoijkit 


k, l, and t refer to Sex, sight, weight, rate, date 
idual, respectively; A is the grand mean of all 
individuals; B, C, D, E, and F are the measures of the main effects with 
respect to their own subscripts; the Гв are the measures of interactions 

` With respect to their own subscripts; and Zsijku 18 experimental error. 
The mathematical solution of the problem for securing the maximum 
likelihood estimates of each of the components in (13.09) is the same 
as that used in Chapter XI. In order to save Space, we shall simply 

summarize all the results given in Table 90. 

^ We wish to evaluate the 33 terms (listed below) in order to obtain 
all the sums of Squares for the complete analysis of variance. To get 


the value óf the term 2 i 5 > р) 2 Xa, We simply work out the sums ` 
DP ECT 


aig 


h sum of scores in the appropriate 
table;? (2) add the ivide by the appropriate number which 
refers to the indivi involved in each sum of scores. The 
: (1) work out the Square for each 

; add these squares; (3) multiply 
2“ Ар 


- The sum of si tai dding 
the scores of (1) and (2) of Table 89. Thus: 4 р егіз Ec ER is 
obtained by dividing the su: 


18. 
m of scores by 2: = = 9.25. Mathematically, the sum 
of scores is denoted by 2 Хали, where 2 means the summation of the two indi- 
t 
viduals; and the mean Score is denoted by 


t 
у Xii, which is the mea the sth 
вех and the ¿th sight for the jth weight and the kth rate on the lth dite, оте " 
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by the appropriate number which refers to the individual measures 
involved in each mean score (this number will be the same as in the first 
method). We prefer to use the first method in calculation since it is 
more accurate from the viewpoint of significant figures. We use the 
second method, since it is simpler, in the presentation of the formulas. 
By following the working procedure indicated in method 1, the values 


of all the 33 terms for our problem 


are obtained as follows: 


ue 2 4 5 5 2 2 Хам = 441,140.30 


ead a] 


= 2) ү 2 2 2 (Хм .) = 406,702.49 


„IIOP 
DED) 


1 2| 


= 


d 


sd 2 H 2 2 Оз.) = 402,722.52 
8 1 2 с 
-8 222 2 (X3,.,.) = 342,295.66 


= м) 5 2 2 (X3...) = 395,929.74 


=а 2 > 2 | (Х? эы .) = 370,451.44 
8 2 с 
-4 22 22 (X?,4.) = 348,532.41 
POOR 
= 16 5 2 E (X35...) = 340,052.38 
s i j 


= 28 2 2 z (X3.,..) = 395,333.63 
E end 


» 
28 
d i k 


imo) 


t 


yx 
no 


2 


> X, шш) i 
% 5 X siu) 


d i k à t 
à » > b (У b 2 Хош)! 
E 3 7 i i t 


= 562 2,2, (X3..,.) = 334,642.70 
а 4 t 

-8 552 (X2,4..) = 368,014.35 
а Е 


- У) = 311,866.98 
8 7 
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Xam) | 

ds = УО УУ =) = 222 (X2,.4.) = 365,342.11 

1121011 

d; it 5 8 Е 
УУ (У хм)" 

ds ER, 1 E t 
1110) xm) 

ds =, 8 1 8 E t 
59505) Xm) 

8 


8 2, р » (X24,..) = 346,776.80 
2% 


= 6222 (X2,.,.) = 293,525.33 
EJ 


= 28 244 (X2,.,4.) = 341,930.40 


2 


gp == 


25000259) 


" 112 = 1122 (Ха...) = 834,243.15 


20055) 


8 у » 2 (2° m.) = 831,338.72 


i 


02 


€ 


32 LI (2.,...) = 310,616.80 


€ 


= 52) (X2..,..) = 365,094.64 


қ 112 = 12 22 (X2...1.) = 308,428.40 
225255 хм) | 

ё ume - = 22) (X...) = 292,577.43 - | 
52 (УУУ Xam)" 

Tw ^56 = 56 22 (Šte. m.) = 841,734.41 
УО УУУ х) 

m EE = 112 2 (Х?;..,.) = 288,699.87 
1021-9 

indi 16 = 16 b 2 (X?.5,..) = 330,141.20 
330332x4) 


cally non-significant interactions were incorporat 
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Уу (Уух) : 
,»= tts x { = 5622, (2°...) = 328,018.72 
10212119» у, 
беке у 


"mE. 2 ad 2 (X2,....) = 288,579.46 


JOR! ov 
OWA n 


€1 


xX? ) = 308,303.02 


А 112 = 112), (X2,.,..) = 327,841.31 
" k 4 
10113279" |... 
b= = 224 2, t...) = 276,885.61 
(УУУУ х) В 
ق‎ i غ‎ = 448X*,.... = 276,769.37 


e appropriate formulas of Table 90, 


Substituting the above values in th 
necessary for the complete analysis 


We obtain the specific sums of squares 
of variance. 


We first test the significance of each of the interactions? of which 


there are 10 of the first order, 10 of the second, 5 of the third, and 1 of 
the fourth order. It is customary to call the interaction involving 

factors an interaction of the first order; one involving 3 factors, 1 of the 
Second order, and so on. "Тһе test of the significance of these interactions 
is given in Table 91. It is noted that the following interactions were 


Significant: . 
sex X sight X rate sight X rate 
sex X sight sight X weight (doubtful) 
sex X rate 


tful) interactions were retained 
of-variance table. The statisti- 
ed in experimental error. 
variance and the results of the corresponding 
given in Table 92. 


The significant (including the doub 
aS specific components in the analysis- 


t The complete analysis of 
ests of the respective hypotheses are 
eo MS 


nvolved such that increases or decreases in one 


3 А 
When two or more factors are i € с 1 
ases in the other(s), or vice versa, interaction is 


More) influence increases or decre 
to exist, 


(or 
Said 
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TABLE 91 
Tests or SIGNIFICANCE OF INTERACTIONS AS Sources or VARIATION 


-— Sum of | Mean Test of 
Source of variation D.F. squares | square : hypothesis* 
КРӨ нгы Qa eoque арасан aiat 224 | 34,438 ШИ lakes ымын 
Sex X sight X weight X rate Xdate.| 18 270 TO. Гоа аз Accepted 
Sex X sight x weight X rate....... 18 320 18 e Accepted 
Sex X sight X weight X date.......| 6 379 ЮЗ | ossis Accepted 
Sex X sight X rate X date.......__ 3 60 20 | ..... Accepted 
Sex X weight X rate X date........ 18 538 80 |... Accepted 
Sight X weight X rate X date. 5% 18 205 ЇЙ. | Accepted 
Sex X sight X weight......... Ке 8 1,406 234 1.52 | Accepted 
Sex X sight X rate... . ете 3 2,216 739 4.80 | Rejected 
Sex X sight X date... 1 270 270 | 1.75 | Accepted 
18 215 03 RN Accepted 
6 637 ЛЕТ Accepted 
3 61 ЛЛ ЕРЕ Accepted 
18 654 88 | аъ Accepted 
6 340 67 | ааыа Accepted 
3 14 Б: | anie гу Accepted 
18 527 2 Accepted 
1 | 14,130 14,130 | 91.75 Rejected 
6 405 681... Accepted 
3 | 5,720 | 1,907 | 12.38 Rejected 
Sex X date., 1 9 
: h B Issa Accepted 
Sight X зге 2,089 | 348 | 2.26 | Remains in 
" doubt 
БІ Ж тае... 3 | 2,083 694 | 4.51 | Rejected 
Sight X date... 1 
Weight x rate Я 5 | кан ERE 
E SERRE A RE a ына ak 18 391 22 a Accepted 
Ys 
n білік bio p ыы saxa 6 488 81 Accepted 
3 61 20 Accepted 


* The hypothesis tested is a null hy othesi, x FF 
row. For example, the hypothesis re sa ire x x erning the variation in the same 


ard. В s А 
that there is no significant interaction ШЕ вех Х Sight X weight X rate X date is 


significant main effects: Sex, sight, weight, and rate 
significant second-order interactions: sex X sight X rate 
significant first-order interactions: Sex X sight 


Sex X rate 
Sight x weight 
Sight X rate 
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nificant. This result demonstrates that the observations were consist- 
ent among themselves. 


TABLE 92 
COMPLETE ANALYSIS OF VARIANCE оғ D.L.-VALUES 
ES 

3 чү Sum of Mean 4 Test of 
Bource of variation D.F. squares square 4 hypothesis* 

Residual........... 419 41,692 100, | eens ae 

Sex X sight X rate. 3 2,216 739 7.39 | Rejected 

Sex X sight.......... 1 14,130 14,130 141.30 Rejected 

3 5,720 1,907 19.07 Rejected 

6 2,089 348 3.48 Rejected 

3 2,083 694 6.94 Rejected 

1 31,534 31,534 315.34 Rejected 

1 11,810 11,810 118.10 | Rejected 

6 1,909 318 3.18 Rejected 

3 51,072 17,024 170.24 Rejected 

1 116 116 1.16 Accepted 

ӨШ, шәл нык sts tae тй 447 | 164,371 


* The hypothesis tested is a null hypothesis regarding the variation in the same 
Tow. For example, the hypothesis concerning date is that there is no significant 


difference between the date means. 


From the standpoint of the efficiency of the factorial design in this 
experiment, it can be said that we have tested 26 hypotheses regarding 
interactions and 5 hypotheses concerning main effects. If we had used 

6 single-factor plan of experiment, we should have required 56 experi- 
ments for testing the main effects of rate; 32, for weight; 112, for sex; 

2, for sight; and 112, for date. We also would have had to repeat 
the t-test 06. Cpeoxeex? = C34 times. Furthermore, no information 
Would be possible concerning the interaction effects. І 

The Problem of Prediction. The regression equations of D.L.-values 
Оп each of the factors and interacting factors, which were found to be 
Significant, can be determined. With these equations it is possible to 
Compute D.L.-values for any particular value of the independent variable 
Within the range of factor levels used in the experiment. ‚2. 

We shall illustrate the use of orthogonal polynomials for determining 
the regression equation for predicting D.L.-values from weights. 

; 6 proceed to work out linear, quadratic, and cubic regression equa- 
tions, nly the linear coefficient was found significant here, but the 
Methods of calculating the latter two are also given. We shall show the 


. * For t ; i f the other significant factors in this study, see the 
qj inal ‘etc ne a ыт ы references аге 2 and 10, particularly 10, for the 
Scussion of the meaning of the linear, quadratic, and cubic terms. 


* 
М 
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method of separating effects associated with more than one degree of 
freedom into component parts that are mutually orthogonal. Because 
of the latter property, the components may be estimated from the data. 
If in our experiment there is only 1 degree of freedom representing the 
tested variation, for example, sex and date, there can be only a linear 
relation between the two levels of variation and the D.L.-values. If 
there are more than 2 degrees of freedom or more than 3 levels of varia- 
tion, then these can be separated into component parts—linear, quadratic, 
cubic, and so on—that are mutually independent. Even when there are 
more than 3 degrees of freedom or more than 4 levels of variation, we 
usually do not calculate terms higher than the cubic. 

We first record the means of the D.L.-values found for each weight 
and transform them as follows: 
= 

W (weight) Y (D.L.-value) 


z y 
100 28.7922 —8 3.9368 
150 26.4141 -2 1.5587 
200 25.6344 -і 0.7790 
250 23.7453 0 —1.1100* 
300 23.8297 1 —1.0957 
350 23.2563 2 —1.5991 
___400 22.3156 3 —2.5897 
W = 250 Y = 24.8554 Za? = 98 
a ар 
W — 250 
where z = 50 V = Y — 24.8554. 


We then refer to the tables of Fisher and Yates on orthogonal poly- 
nomials (Ref. 8) for N = 


7, which reads: 


LM ы [74 

—8 5 = 

-# 0 1 

= = 1 

D d l 

= =1 

2 0 —1 

3 5 1 

Th" = 28 SH = 84 SR G 

Apes 1 w= 1 з = + 
сари 


Finally, we obtain all the regression equations as follows: 


Linear: Pod сул (13.10) 


“ 
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where co = Ӯ = 24.8554 ; 
о = 280, 
Quadratic: =+ cit + cox? (13.11) 
Where 
с! = e = кыа 
Cubie: Y = d + сул + сох? + coed (13.12) 
where o=Y (13.13) 
& = ¢, (2298 (13.14) 
e; = 280), (13.15) 
gy 
& = Б м (13.16) 
cy = 280), (13.17) 
Ei 


The calculation of the regression coefficients for weights is carried 


out as follows: 
xu oJ 


ш y ы ыу e! ыу &' ыу 
ا‎ 

-8 | 3.9368] —з | —11.8104 5 19.6840 | —1 —8.9868 

2% 1.5587| -2 - 8.1174 0 0.0000 1 1.5587 

E^ - 0.7790 | —8 | — 2.8870 1 0.7790 


0.7790 
—1.1100 
—1.0257 
—1.5991 
—2.6897 


0.0000 
— 1.0207 
— 8.1982 
7.6191 


4.4400 
3.0771 
0.0000 
—12.6985 


Egry 
= 12.1656 


Y using Equations (13.13), (13.14), (13.15), (13.16), and (13.17), 


We obtain 
Co 
Сі 


ІСІ 


24.8554 
—.983921 


ca 
©з 
% 


1 


144829 
— .042056 
24.2761 


" Hence, the regression equations can be obtained by substituting these 
“шев into Equations (13.10), (13.11), and (13.12). 


near: 
Quadratic : 
Cu le: 


Where /- 250. 
50 


T = 24.8554 — .9839212 E 
P = 24.2761 — .983921z + .1448291° 
f = 242761 — .9839212 + .1448291° — 0420562 


(13.18) 
(13.19) 
(13.20) 
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The test of significance of the components of variation due to weight 
is given in Table 93. 


TABLE 93 
Components оғ VARIATION Due то WEIGHT 
-— Sum of | Mean Уу Test of 
Source:of variation Р.Е. squares | square p hypothesis 


TOL GROSS ықы а ian sake prt "өз gaye meen 1 1735 1735 17.35 Rejected 

Quadratic »a| f 113 113 1.13 Accepted 

CUBIC. ss ens ely ! 24 PE D aues Accepted 

Remainder, sun аа қасаға mats 3 37 19. | usi Accepted 
МЕКШІ eon ол ores citet дақ 6 1909 


Itis noted from Table 93 that only the linear component is significant. 
Hence, only the linear equation is to be used in prediction. The graph 
ee ا‎ Ере cdd 


^or 


al 


© 


D.L.Values (grams) 


100 150 200 250 300 350 400 
: А : Weight (grams) 
Figure8. Linear regression line of the equation for predicting D. L. values 
from weight values. ai 


Б ста 8 regression equation for the observed D.L.-values is sketched 
ig. 8. 


» ed Design and Covariance in a Study of Educational Develop- 
; e wish to illustrate further application of the principles 0 


Cuar. XII] APPLICATIONS OF THE PRINCIPLES 311 


factorial design by presenting the results of an investigation of individual 
educational development.’ An application is also made in this study 
of the method of covariance which served to increase the precision of the 
experiment. The specific design developed for this study was a 2 X 3 
X 3 X 3 factorial type. The factors chosen for study were the 2 sexes, 
3 scholastic standings, 3 individual orders, and three school grades. 

In addition to the introduction of the covariance method for control- 
ling variables not controlled or controllable directly by the experimental 
design, this experiment differs from the one in psychology just reported 
in that the type of factorial design is of the kind in which absolute repli- 
cation is dispensed with and hidden replication is involved (Ref. 7). 
This type is desirable when large numbers of combinations are tested 
Simultaneously without repeated use of each combination. All the 
independent comparisons contained in the experiment are allotted to the 
factors tested and to their interactions. Since there is no independent 
Comparison ascribable to pure error, the highest order interactions are 
employed as the basis for measuring the precision of the main com- 
Parisons. The situation in this study has a very wide occurrence in 
research work. 

The criterion score used as a measure of the stage of educational 
development was based on a composite score comprised of the scores on 
nine separate tests (Ref. 13). Тһе standard scores used—ranging from 
9 to 80 with a mean of 15—were determined from the combined grades, 
that is, the tenth, eleventh, and twelfth grades. "There were 18 students 
from each of the 3 grades, all chosen at random from the total number 
enrolled in these grades. The mental-age scores were obtained from the 
Administration of a group test of mental ability and were calculated for 
all students as of the same date. АП students in the tenth grade were 
of chronological age fifteen; in grade 11, sixteen; in grade 12, seven- 
teen, Students were classified into one of three scholastic groups—good, 
Average, poor—based on their honor-point ratios. Individual order of 
educational development was based on the size of the scores of the 
individuals on the second of the two administrations of the battery of 
tests, The interval between the two administrations was 12 months. 

Let us denote the final score, the initial score, and the mental-age 
Score by Y, Ху, and Xs, respectively. Again, the two sexes are denoted 

Y I for the miale and II for the female; three grades are denoted by A 
for rade 10; B, for grade 11; and C, for grade 12. The three scholastic 
Standings Ais denoted by 1 for the good, 2 for the average, and 3 for the 
Poor; and the three individual orders by о for the first, B for the second, 
and y for the third. The primary data grouped into the several sub- 


ing For th is of the experimental results in this investigation, their 
int, 1e complete analysis of t Xp! r : tke n еш read 
Ref Tg tation, and the mathematical formulation and solution of the р m, 
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classes in accordance with the notations specified are presented in Table 
94. 


TABLE 94 
Scores ron ALL Sex X GRADE X SCHOLASTIC X INDIVIDUAL COMBINATIONS 


а 21 16 44 26 22 60 33 29 94 
1 B 21 21 44 25,22 57 29 29 89 

Y 19 17 6 | 23 19 52 25 22 78 

a 20 18 38 22 19 54 23 21 50 

II 2 B 18 16 27 21 19 54 18 19 57 
Y 14 14 18 17 16 52 17 17 43 


In our problem, we define: 
Ysin = the final standar 
standing in the ith grade 


Хы = the initial standard score of the tth individual of the jth 
scholastic standing in the ith grade and the sth sex, 


Xs, = the mental-age score of the ith individual of the jth standing 
in the ith grade and the sth sex. 


d score of the tth individual of the jth scholastic 
and the sth sex. 


Ехамғив1. In order to evaluate the term Үз, we simply 
2177 


t 
refer to Table 94 and work out all the squares of the Y-measures. The? 
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we sum these squares and obtain the required value, for example, ai 
= 22,730. 


TABLE 96 
Sum or Scores ror Елсн Sex X GRADE X SCHOLASTIC COMBINATION 


B с 


ХҮ ХХ, ZX: | ZY ХХ, ХХ, | 2Y УХ, УХ, 


1 77 69 149 76 64 184 80 68 212 
п 2 57 50 104 62 56 133 62 55 186 
8 47 41 70 47 42 87 46 43 95 
1 61 54 94 74 63 169 87 80 261 
II 2 52 48 83 60 54 160 58 57 150 
3 35 23 41 47 41 116 40 36 85 


ExaMPLE2. In order to evaluate the term 
Qa 
impe 


we refer to Table 96, then compute all the products of EY and ХХ, in 


the same row. We then add these products and divide by 3 to obtain 
the quantity: by, = 19,786. 

The sum of scores for each Sex X grade X scholastic combination 
as given in Table 96 was obtained by adding the scores for o, 8, and ¥ 
as given in Table 94. Thus: 30 T 25 + 22 = 77. 

By following similar procedures as illustrated for Examples 1 and 2, 
es for the 96 terms extending from a; through €6. 
Here we shall present the results based on one analysis only: the 


complete analysis of variance and covariance partialing out the effects of 
both initial score and mental age. 


specified in Table 95: 
а = у 2 I у Үз» = 29,730 

44 3 € 
а = y y b У X, = 17,926 

© t3 £ 


° For a complete analysis see Ref. 13. The exami ti f i der- 
| [ у 18. the assumptions unde: 
lying the analysis of variance and covariance led pg i they 
could be tested. See pages 218-219 and R Tin Chanter Lome pay te 


uper iT ef. 1 in Chapter X, and pages 251-260 in 
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54 

(DIED Xe) 02225) 

ee rr: 72-831 = 41,588 
54 

The application of the method involves the calculation of the sums of 
Squares of the dependent variable and of each of the two independent 
Variables, and the sums of products of each of the independent variables 
With the dependent variate to be adjusted and with each other. These 
Values are obtained by applying the appropriate formulas in Table 95. 

We first test the significance of the interactions. The complete 
analysis resulting in the tests of significance of the several hypotheses is 
Biven in Table 97. Since the adjustment for the two concomitant 
Variates has been obtained from the error term, 2 degrees of freedom 
ascribed to error have been used in evaluating it. The reduced sum of 
Squares assigned to error is divided by the corresponding number of 
degrees of freedom to obtain the mean square (1.41) appropriate to test- 
Ing the significance of the remaining interactions. No significant inter- 
action was found. Therefore, 44 degrees of freedom became available 
Or testing the significance of the main effects. 
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2 eo The complete analysis of variance 

E E and covariance of the final scores, 

2, :8 partialing out the joint effect of initial 

Ж is Score and mental-age score, is pre- 

sented in Table 98. The analysis, 

ы if which has used all the evidence of the 

" 95, relevant data, led to the conclusion 

E that there was a significant difference 

E Е 8 E among the means of the final scores of 
т” ae 


the scholastic groups and of the indi- 
vidual orders of development when 
adjustments were made for the differ- 
ences in initial and mental-age scores. 
The difference between the adjusted 
means of the sexes was significant at 
the 5 per cent level. 

The whole procedure of making an 
exact test of significance based on the 
reduced Sy? when there are two inde- 
pendent variates is illustrated for the 


test of significance for "grade" in 
Table 99.7 


2,2 


> 


y 


TABLE 99 
ILLUSTRATION оғ TEST ОЕ SIGNIFICANCE WITH REDUCED 


PROBLEMS 


1. Design an experiment to determine 
the effect of trainingupon individual 
differences. 


(Partialing out the effects of X; and X2) 


2. Design a factorial experiment for 
determining the effect of practice 
of different levels and kinds upon 
transfer of training. 

3. Design a factorial experiment to 
determine ‘the effect of various 
lengths and frequencies of intervals 
upon learning a fundamental pro- 
cess in arithmetic. 


i 4. Design an educationa experiment 
F which makes use of the Latin- 
'8 8 td Square arrangement. 
БЕЗ 1 z 
58 e 5. Devise a method of comparing the 
as ds efficiency from the use of the follow- 
2 lS в. 
* 3 7 : 
£ E For the detailed solution of the problem 


of estimation, see Ref. 13. 
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ing three different types of experimental design: Assume the experi- 
ment is designed to determine if there is a differential effect of three 
different treatments (for example, different dietary treatments on 
school children). Let A, B, and C represent the three treatments; O, 
the dummy treatment; I, II, III, the three school terms. In Design 1 
the possible diet sequences are given by 1, 2,3, . . . , 24. 


Desien 1 


di 
term II 
II 


2 

School I|CO 
term II| CO 
II|CO 


In Design 2, the same treatment is administered to the same child 
throughout the three school terms. The treatments are randomized 
in blocks of 4 children, who are selected to be as alike as possible. 


Desien 3 
a... БЕНЕН 
1234/5678 
School I|DBCA|OCBA 
term IILIBACO|OCAB 
ШІ ЈАСОВЈАСОВ 


In Design 3, the treatments within each block of 4 children for each 


term are rerandomized. - : Р 
6. Assume there are 15 persons who are to be invited to 35 dinners, an 

that 3 persons are to take part in each dinner. Arrange the invitations 

for dinner so that each person is invited 7 times, and 2 persons meet 


at a dinner just 1 time. 
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CHAPTER XIV 
MULTIPLE REGRESSION PROBLEMS 


It frequently happens in experimental situations that we are concerned 
with the problem of estimating or predicting one character from a 
knowledge of another or of a number of other characters. For prediction 
or estimation of this kind to be useful, it is necessary that a change in 
the variable to be predieted is accompanied by some corresponding 
change in the other variable or variables. Problems of this kind require 
the quantification of this apparent relationship existing among the vari- 
ables and are spoken of as problems in regression. 

In the simple case of the regression of one variate on another, the 
regression function takes the form 

Y'2a4b(X — X) (14.01) 


Where b is the regression coefficient of Y on X, and Y' is the predicted 
value of Y for each value of X. 

{ The Multiple Regression Equation. If, instead of having only one 
independent, variate, such as X in the simple case above, we have meas- 
ures on several independent variables, then we can express the mean 
value of the dependent variate, Y, in terms of the several independent 
Variates, This is the multivariate case to be treated in this section. 

We denote by Y, the value of the criterion variable, and by Xw the 
Value of the ith measurement of the tth individual, respectively. Then’ 
the multiple regression equation (or, more accurately, the partial regres- 
Sion equation) for obtaining the simple weighted sum of the measure- 
ments, Y! may be written 

Hn = a, + Xa FS + Хы 


Where it is assumed that we have the value of the criterion variable, 
Y, and Б measurements of each individual. In Equation (14.02), ao is 
û constant; the b’s are known as the partial regression coefficients and are 
also constants, Instead of the subscript bı, for instance, the subscript 
y 23, ... ,kor0.123, . . . ,kisoften used. This subscript indicates 
More completely than bı that the partial regression coefficients show 
how greatly unit changes in the individual X variables affect Y,, inde- 
Pendently and directly. The values of these constants are to be deter- 


Mined in each case from the available data. - 
If we let y, y], tı, ds, . . . 2 represent the deviations from the 
ر‎ yn Xi, 22 = : 
respective means of the variables, there is no need for the term ао in 
327 


(14.02) 


328 | MULTIPLE REGRESSION PROBLEMS  [Cmar. XIV 
Equation (14.02), because 
Ey; = Ут = 22, + +, = En = 0 
In terms of this notation, Equation (14.02) becomes 
yi = biti + bee + + ° ° + diay (14.03) 


In order that y; be the best linear estimate of y, when “best” is con- 
sidered in the light of the least-squares criterion, E(y, — y/) must be 
minimized; that is, E 

(ye — biti — bata — + + + — brx)? (14.04) 


must be minimized. A necessary and sufficient condition for this min- 


imum sum of squares is that the b’s satisfy the following system of 
equations: е 


Z(y — bits = бәлә — ۰۰۰ — Әкті) =0 
` Z(y — bızı — Data — 2. — ӛт = 0 
Be 4 2 m (14.05) 
00. <A 2, 
Z(y — biz — bor, — ^. Mem Ыла) = 0 


The left members of these equations are the negative of one-half of the 


partial derivatives of (14.04) with respect to bi, be, 


bs ВБ 
9 
Equations (14.05) тау be written in the form 
bidet + baza 4o] + 0.5112, = Уллу 
; bi dai, + baza} oe 4 0.212, = Trey 
` | к (14.06) 
bi Daa, + Қолы» + ++ + DI = Erry 


The set of equations 
computing the necessar 


› the original measures instead of the devia- 
tions are used, these values о 


E y be substituted in Equation 
(14.02). The value of a, = Y — box, — y 


2 ° ° ° — Ху, where the bars 
denote the mean values of the Several variates, 


The accuracy with which the Tegression coefficients or weights enable 
us to predict or estimate the values of the criterion variable is determined 
by computing the multiple correlation coefficient, This may be inter- 

ation coefficient between the actual 


preted as the zero order, or total correl 
values of Y, and the values У; predicted from the multiple regression 
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equation (14.02). The development of multiple Ё as a measure of the 
accuracy of prediction of a multiple regression equation may be observed 
as follows: 


N 
x= ) (Е, У)? (14.07) 
гл К 


Let R represent the correlation between the two sets of scores, Y, 
and Y/; and let Sy} = Z(Y, — Y)? = the sum of squares about the 


mean of Y,, Equation (14.07) may then be written 
"os xt = XX — R’) (14.08) 


from which it follows that the multiple correlation coefficient, R, is the 
measure of the accuracy with which the criterion scores may be pre- 
dicted. It may also be pointed out that the multiple correlation is 
another case of the analysis of variance, that is, of analyzing Dy? into two 
parts, one associated with regression and the other a residual. ` 
The value of R, the multiple correlation coefficient, may be readily 


" 


calculated from the following equation: | 


M bore 
m- biX(Quy) + Eat 00 rb EQ) (14.09) 


The normal equations (14.06) may be modified by dividing both 
members of the first equation by У 21° Zy^ both members of the 


Second equation by VW 223" ху; апа... of the kth equation by 
V Ba? DEL 
This modification yields the following system: 


Bi + Borie + ... + Bur = fır 


Вт + BaF °° ° + Bu = Tex 


(14.10) 


Bore Bett + Bu = Ter 
H Da? 
33 ..- sand br = br A The (78 


7 Ez > 
where 8, = bı a Bo = 024 [у Sy! 


are known as the standard partial regression coefficients, to distinguish 


them from the Уз, the partial regression coefficients. The B's are the 
Partial regression ‘coefficients for the variates expressed in standard 


Measure form, thus rendering 
Measurement and giving measures o 
able to each of the independent variates. 
tiple correlation coefficient is given аз 


Баз, si = Ву + Barer +" + Biter 


them independent of the original units of 
f the comparative weight attribut- 
In terms of the 8's, the mul- 


(14.11) 
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A systematic procedure often used for the solution of the system of 
normal equations (14.06) or (14.10) is known as the Doolittle method, 
after its formulator, an engineer with the United States Coast and Geo- 
detic Survey. Doolittle, in 1878, introduced а method which was due 
to various improvements over Gauss’s method of solving simultaneous 
linear equations by direct substitutions. Some modifications of Doo- 
little’s method have occurred from time to time, but the essential fea- 
tures of his method persist (Refs. 8 and 9). This method is applied 
below, but first it is desirable to enumerate what is involved in the com- 
plete analysis of a multiple regression problem. 

We have described above the method of setting up the multiple 
regression equation and of calculating the criterion of its predictive 
accuracy, the multiple correlation coefficient. The values of the b’s 
or the 8° alone, however, give a very incomplete description of the rela- 
tionships between the dependent variable, Y, and the independent 
variates, X, ..., Xy. They do not indicate whether all—or, if not all, 
which—of the independent variates are significantly related to the 
dependent variate; nor can the confidence intervals or fiducial limits be 
specified from them within Which the true val 
cients are to be found. The standard error o 


Y. Occasionally, there 
between a certain set of 
of several dependent variates. Finally, 
n made from the multiple 


f auxiliary quantities, C, 
pl, C», . “9 Cox are the 


k 
b = Д, Cap.) Go1,2...,p (14.12) 


For example, for the case of 3 independent vari: 


t ates the 3 systems of 
equations are obtained by using for the right me: 


mbers of the equations 
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1, 0, 0 for the first system; 0, 1, 0 for the second system; and 0, 0, 1 for 
the third system: 
A, Sa? + AsZExiv. + Asx; = 1 0 0 
А,Ххиаѕ + Adr? + АзХләтз = 0 1 0 (14.18) 
Aus + AoDxerx3 + Adr} = 0 0 1 
The three solutions for these three sets of equations may be written 


44 = Cu, Cis, Cis 
Ag = Cis, Css Сөз (14.14) 


2? 


As = Cis, Cos, Css 


Once the 6 values of C are known, then the partial regression coefficients 
may be obtained in any particular case by calculating Yay, Ixy, Хтзу 
and substituting in the following formulas: 

bi = Cu xy + Сл Хау + Cis Exsy 

be = С Хау + Селеу + Cos 23y (14.15) 

b; = CisXxiy + Cos Etsy + Casta 


Problem XIV.1. The complete analysis of a regression problem. 
We shall illustrate the complete analysis of a regression problem as it was 
carried out in a study of predicting in the School of Agriculture in the 
University of Minnesota. In this problem it was of interest to secure 
the correlation coefficients between the several variates. Furthermore, 
the use of correlation coefficients in the normal equations provides the 
Same order of magnitude for all the quantities at any given step in the 
Solution. Their use is also advantageous in the use of the check column 
to be described later. ‘The standard partial regression coefficients rather 
than the partial regression coefficients are used because of the interest 
in comparing the relative importance of the independent variates, which 
Originally were in different units of measurement. For this case the 
auxiliary set of quantities, the C's used for securing the b's have been 
Supplanted by what we call the g's for securing the B's. . 

We have observed 213 individuals with 1 dependent variable and 
5 independent variables. Let us denote the dependent variable or the 
Criterion by Y, and the independent variables by X1, Xs, Xs, X4, and Xs. 
The scores observed are as follows: 

Ү: honor-point ratios 

X 1: age 

Xs: Iowa Silent Reading Test score 

Ху: Otis raw score 

Ха: previous education in years 

Xs: School of Agriculture Reading Test total score 
. Ге wish to predict the honor-point ratio from the measures of the 
Independent variates. The following steps are pursued: 
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Step 1. Compute all the intercorrelations and the standard devia- 
tions. Let us define: 


7 P ES 

= E; F x Ху-Ү-Ү 
Zt} 002 Xy. ош Уш _ Lary 

$= N—py"—-wy—pmW- Nes; "v = б,в, 


where 7 = j ап42,)-1,... › 5. All the measures in our case are 
summarized as follows: 
М = 213; X, = 15.9296; X, = 161.9061; X, = 37.8498; X, = 8.9531 
X; = 90.0235; Y = 2.3362; s? = 5.34245869; s? = 24633860141 
8$ = 109.11357981; s? = 5.26540141; s? = 587.14968357 
82 -70191238; 815 = 36.27745774; 8183 = 24.14404430 
8184 = 5.30378969; 518; = 56.00734941; 5253 = 163.94782712 
5284 = 36.01487742; $285 = 380.31255769; 8384 = 23.96928698 
5385 = 258.11264376; S485 = 55.60196191; Tis = .0300 
Tis = .0983; r4, = .1100; 5, = 0470; тоҙ = 7148; rs, 
Tos = 82035 r4, = .1821; ry, = -7230; 745 = 1124; ny, 
Тау = .6505; тз, = -5164; Tay = :0993; rsy = .67041 


.0960 
.1784 


ll 
ll 


1 


Step 2. Compute F isher’s auxiliary 


statistics (g;;)’s. The 5 systems 
of simultaneous equations to be solve 


d are 


Right members of system 
(0 (2) (8) (4 (5 
gi + Trego + 1393 + Тый + Tings = 1 

Tigi + g2 + T2393 + T2494 + T2595 = 0 


0 

1 

71391 + 739 + 9з + Tagg + 73595 = 0 0 (14.16) 
Tiagi + Taage + T3495 + 94 + Tags = 0 0 
0 


0 

0 

1 

0 

71801 + T2592 + 75593 + 74594 + 0 


0 
0 
0 
1 
0 


= осо оо 


9 = 0 

The values obtained for the g’s in the first system will be designated 
by 911) 021, 931,94, and gs. The values obtained for the g’s in the second, 
the third, the fourth, and the fifth systems will be designated by gis, 022, 
932, 942, and gs2; by 913) 923, 


933, 943, and gs3; by g14, 924, 034,044, and by gos; БУ 
915 gos, 935, 045, and ges, respectively. It is worthy of note that 


09 = gi (i ك‎ jj i j = 1,۰۰۰, 5( (14.17) 


1 We have used 4 decimal 


i This is li ч inimum 
number with the number of equations and of unknowns p^ ep A nrbes 
equations and unknowns Increases, the Dooli 
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1861 [stor |2200: — 9296" —|євїт`г— |72007- = 20000" 1 1882” + (GZ) : (£2) 
£98£' 00001 |s700' — — |4000* — = [21882 (та) + (08) + (61) + (81) + (20) : (33) 
0800 —| 0 8100 — -|2000: = |40000" %97100: — “4- (8102:-) X (91) :(12) 
6197 —| 0 0 6120: = |*06250° h2egzo" — |toogr- — “4- (66/2:-) X (6) :(05) 
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9008 `2 0 1 0 0 0 -/|Әғап: 200001 |1281" 00011” (AD “bo 

S££6 I 0 0 599072 |212771-|08917- = |08642" LT “7000071 бе?” + (6) :(01) 
8966 0 0 0000:1 |0712: — 69L0’ — -|Әрсвт” 'бєсот` 00887 (8) + (2) + (9) :(6) 
see e-| 0 0 0 6111" — F130’ = |00889" |20990' — |0f909" — jogriz — 2- (112 —) X (е) :(8) 
гг” -| 0 0 0 0 5800" — = |09700" "080107 - |44600: — |:00800° — |:08860° — | 19— (860° —) X (1) (2) 
LE 0 0 1 0 0 -/%060/: MAC 290001 faert" 105860" (IID “Pa + (9) 
2464 Е 0 0 0 |600071 |0020" — -|49618: | я260: |06127 |:0000 T 1666" + ($):(9) 
026475 0 0 0 000071 |00£0* — = DOSS’ "04260" 43474 201666" (е) + @):(%) 
92890 —| 0 0 0 0 0060" — = Df 100" 08800: — *00£00° — |46000" — |woot0 — | 'g— (00£0* —) X (D :(£) 
9099" & 0 0 0 1 0 = |06028 ° 00960" toeris: “0000071 |0060" (тр) “ba :(5) 
8486 б 0 0 0 0 t = 100150" 00011 "0860" “00060” 10002071 (D ®я:(б) 

iond 
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For our problem, (14.16) becomes 


(1): 1.0000g; + .0300g2 + .0983gs + .1100gs + .0470g; 

-1 0 0 0 0 
(ID: .03000: + 1.0000» + .7143gs + .09600: + .82039; И 

=0 1 0 0 
(111): .0983g; + .7143g2 + 1.0000g; + .1821g, + 72309; 

-0 0 1 0 0 
(IV): .1100g; + .0960g2 + .18210: + 1.0000; + .1124g; 

=0 0 0 1 ő 
(V): .0470g, + .8203g; + .72300: + .1124g4 + 1.0000g; 

20 0 0 01 


A systematic procedure often used for the solution of such a system 
of equations is shown in Table 100. A convenient check column is often 
carried along, to the right of these computations. The first and second 
entries of this check column are found by adding all other entries in their 
respective rows. The third entry is found in two ways, thus yielding a 
check on the accuracy of the arithmetical computations. The first way 
consists in the addition of all other entries in the third row. The other 
way consists in operating on the first entry in accordance with the direc- 


tions given at the left. The other entries in the check column are found 
in à similar way.? 


The values of gs, 52 


9з, gs4, and gss can be read directly from the 
last row, numbered (23): 


ga = 40M; f= —21485 — = = 0078; ды = —.0082; 
955 = 3.4638 
We get 041, 042, gas, 944, and gis as follows: 


Substitute gs; in Eq. (16, Table 100) and use column F in the right- 
hand member: 


2а + .0018(—.0024) = — 0946 

941 = —.0946 
Substitute gs: in Eq. (16, Table 100) and use column F’ in the right 
member: 

942 + .0018(—2.1493) = .0649 

942 = .0688 
Substitute gss in Eq. (16, Table 100) and use column F” in the right 
member: 

9з + -0018(—.9678) = —.2275 

gis = —.2258 


2? It should be noted that errors occurring in the roundin f the original corre- 
lations are not accounted for by the check column, a зын 
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Substitute gs; in Eq. (16, Table 100) and use column F’” in the right 
member: 
gis + .0018(—.0062) = 1.0456 
2а = 1.0456 
Substitute gss in Eq. (16, Table 100) and use column F in the right 
member: 
gis + .0018(3.4638) = 0 
gis = —.0062 
To obtain gai, gas, gas, 934, and gas: 
Substitute 7 and gsi in Eq. (10, Table 100) and use F in the right 
member: 
gai + .2176(—.0946) + .2798(—.0024) = —.1589 
j= = TE 
Substitute gaz and gs» in Eq. (10, Table 100) and use 7" in the right 
member: 
gaz + .2176(.0688) + .2798(—2.1493) = —1.4712 
0за = —.8847 
Substitute g4; and gss іп Eq. (10, Table 100) and use F” in the right 
member: 
gas + .2176(—.2258) + .2798(—.9078) = 2.0665 
gas = 2.3864 | 
Substitute и and дэ in Eq. (10, Table 100) and use F’’ in the right 
member: 
ga + .2176(1.0456) + .2798(—.0062) = 0 
gu = —.2258 
Substitute g4s and gss in Eq. (10, Table 100) and use 7? in the right 


member: 
gas + .2176(—.0062) + .2798(3.4638) = 0 
= —.9678 


935 = 


To obtain g21, fa, 923, J24, and gos: 
. Substitute gsi, фа, and gsı in 
"ght member: 


921 + .7119(—.1377) + .0928(—.0946) + .8196(—.0024) = —.0300 
921 = 0788 


Substitute and gs» Y 
Member: 932, 942, 


922 + .7119(—.8847) + .0928(.0688) + .8196(—2.1493) = 1.0009 
922 = 3.3859 


Eq. (5, Table 100) and use F in the 


n Eq. (5, Table 100) and use F' in the right 


Substitute 933, gis, and gs3 in Eq. (5, Table 100) and use F” in the right 


ember; 
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ges + .7119(2.3864) + .0928(—.2258) + .8196(—.9678) = 0 
0з = —.8847 


Substitute 034) g44, and gs; in Eq. (5, Table 100) and use Г” in the 
right member: 


gza + .7119(—.2258) + .0928(1.0456) + .8196(—.0062) = 0 
gos = .0688 


Substitute gos, gis, and gss in Eq. (5, Table 100) and use 7? in the 
right member: 


92 + .7119(—.9678) + .0928(—.0062) + .8196(3.4638) = 0 
925 = 2.1493 
To obtain 911, 912, 918, gus, and gis: 
Substitute g21, gsi, ga, and go: in Eq. (1, Table 100) and use F in the 
right member: 


gu + .0800(.0788) + .0983(—.1377) + .1100(—.0946) + .0470(—.0024) 


gu = 1.0217 


Substitute 22, 032, 042; 
right member: 


gı2 + .0800(3.3859) + .0983(— 


and gs: in Eq. (1, Table 100) and use F’ in the 


:8847) + -1100(.0688) + .0470(— 2.1493) 
=0 
gi» = .0788 


, Substitute gos, gaz, 943, 
right member: 


gis + .0300(— 


and gss in Eq. (1, Table 100) and use F” in the 


8847) + .0983(2.3864) + .1100(—.2958) + .0470(—.9678) 
=0 
gis = —.1377 " 


Substitute 924, 034, 044, 


fad and бы in Eq. (1, Table 100) and use 7" in the 


gus + .0300(.0688) + .0983(—.2958) -+ -1100(1.0456) + .0470(—.0062) 
20 


gu = —.0946 р 


Substitute gos, 935, 045, 
right member: 


gis + .0300(—2.1493) + .0983(—.9678) +. -1100(—.0062) 
=0 
ЖАС” + .0470(3.4638) 


and gss in Eq. (1, Table 100) and use F% in the 


Тһе accuracy of the (9:)’s (i = j) can b 


е checked ation 
0а = gx; and the accuracy of the (0а ot rs 


)’s can be checked by a method 
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illustrated by Wallace and Snedecor (Ref. 25). It is shown that to 
obtain фи, the sum of products of the last two members (regardless of 
sign) in each section in column (F) is found; similarly, for goo the same 
procedure is followed in column (F’), and so on. In our problem 


gu = 1 + .0300(.0300) + .0769(.1589) + .0905(.0946) + .0007(.0024) 


= 10217 
Оза = 1.0000(1.0009) + .7119(1.4712) + .0621(.0649) + .6205(2.1493) 
= 3.3859 
Оза = 1.0000(2.0665) + .2176(.2275) + .2794(.9678) = 2.3864 
gis = 1.0000(1.0456) + .0018(.0062) = 1.0456 


955 = 1.0000(3.4638) = 3.4638 
Since we have checked all our results, we shall summarize them as 
follows: 


би = 1.0217; giz .0788; gis = —.1377; gu = —.0946; 
915 = — .0024 
gu = ..0788; 922 = 3.3859; Ооз = — .8847; 924 = .0088; 
0% = 2.1493 
931 = —.1377; gs2 = — .8847; gs3 = 2.3864; gs = —.2258; 
935 = — .9678 
ба = —.0040; де =  .0688; gss = —.2258; gu = 1.0456; 
945 = — .0062 
951 = — 0024; gs = 2.1493; 0а = — .9678; 954 = —.0062; 
gss = 9.4688 | 


Step 3. Compute R,.1234s, the multiple correlation between Y and 
X, Xs, Xs, X4, Xs. 


^. Define: 
| в Уот Girt 8) (14.18) 
" E 
where 8; is the standard partial regression coefficient. For our problem, 
we have , 


Bi = фиг + 917и + guaray F uray F istis Et 1514 
Вг = garry 4 gasray + gasray F а + 09575 = .9256 
Bs = garry + бит + 097 — Yai ay F gasrs, = — 0390 
Ва = бағу + gafy F 943" ov + gara, + 94575у = .0109 
Bs = дәт + 952720 + gssray + 05474у + gssrs, = .4232 
Define again: 
10805 = У Bru (e, 5,5) (14.19) 


Ss * +. 

° This provide a complete guarantee of accuracy, since in some 
Instances а Polution might give only small deviations of the left from 
the right member of the equation, and since some deviations are to be expected when 


only а limited number of decimal places are carried along. 


| 
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For our problem, we have 
Filia = Baty + Bare, + Bara, + Barsy + Bsrsy = .50346861 


Therefore, we obtain 
В,лам5 = .7096 


Step 4. Test the significance of Буламь. This can be done through 
the use of the variance ratio. The method is shown below. 


ANALYSIS-OF-VARIANCE TABLE 


Source of variation Sum of squares D.F. Mean square 

(1 — R?) ху? 

Not associated with regression а@—®)2# |N —m—1 MS areal 
Associated with regression Rxy m 

"Total ху? N-1 

š ; RN — m — 1) 

F (variance ratio) = 14.20) 

( ) т(1- В?) ( 


For our problem, 


p — :5035(207) | 
F= "B(4905) = 41.98 
Referring to the Г tables (Table ТУ; Appendix) w 
we have P < .01. 
correlation is signifi 
Step 5. Test t 
Define: 


ith n; = 5and n = 207, 
Therefore, we conclude that the value of the multiple 
cantly different from Zero. 

he significance of (69's. 


DITE A 2 
% > E SET ê, ++. ‚в (14.21) 


where sg; is the standard error of 8:. For our problem we have: 


e Mamm = .0901 
ар = МШБ түт = 0911 
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The test of significance of each 8; is given by 
_ Bi 2 
ae (14.22) 
with N — m — 1 degrees of freedom. For our problem, we obtain 
ів, = 3.059; 1, = 3.014; — бә, = —.515 
lg, = 218; 14, = 4.645 
Referring to the t-table with 207 degrees of freedom, we find that Bı, Bs, 
and 8; are significant at the 1 per cent level, and that 8з and 8, are not 
significantly different from zero. "Therefore, we can omit the independ- 


ent variables Хз and ¥4.* 

Step 6. The omission of Ха: Let us denote by 8¢ and 05 the new 
Standard regression coefficient and auxiliary statistics, respectively. By 
mathematical derivations, we have 


в! = 8 — 288, ( = 1,2,4, 5) (14.23) 
933 
00 = 00—998 (4,4 = 1, 2, 4, 5) (14.24) 
933 
For our problem, we can easily obtain 
3 # _ 923 == 
в; = в, – = s = .1492; B4 = В — 008: = 312 
4 т = в. — 258 НЕ 
Bi = By — 29 8, = .0072; Bs = Bs — TES = 4074 
n 2 = _ 913923 _ 
(i = фи = A = 1.0138; fis = 012 "am тіре 
76- expo 81 کے‎ 
914 = Pig c v — —.1076; gis = 015 0з .0582 
0% = gm — 93 _ 3.0579; 0а = gu — E = —.0149 
933 ! зз 
2 : алада = 
9% = gos — i3 — —2.5081; Gis = gu баз 1.0242 
Gs 


І 
өз 
o 
zi 
= 
©з 


5 . i= = 
9% = gas — рен = —.0978; 06 = s — o 


Proceeding as before, we have 
R? asss = Bins + Bary + Biray + бг = .5028880 
10.1245 = 27091 
p RAN — т — 1) _ 52.6067 
m'(1— R?) 
Where m = 4, Referring to the F-table with nı = 4 and n; = 208, we 


find that P < 01. Therefore, we conclude that the В is significant. 
* For a di i “suppression” variables which might increase the multiple 
SOrrelation e. s em im EA zero or near zero with the criterion, see Refs. 14 and 


27. 
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In testing the significance of (3;)'5, we obtain 


== (1 = №245) 


m Те = 0492 
(1- Riins)g? _ nas 

Sg, = "їй eae ea = .0855 
> (1. = TO, 1215) 044 = 0495 

MSA N—m—1 9495 
1-- R31. Ж 

те. радд 505 L 0857 


Consequently, we obtain 


ty, = ÊL = 3.033; ip, = P = 3.640 


Sar, Sar 
L. 445; ty, = 8 = 4754 
lg, = Ser, = 140; Br = 5Ê . 


2 
Referring to the t-table with 208 degrees of freedom, we find that Bi, ры 
and £; are significant at the 1 per cent level and that 8, is not significantly 


different from 0. Therefore, it is desirable to omit the independent 
variable X4. 


I1 
Step 7. The omission of both’ X; and X, Let us denote by Bi 
and gi; the new standard regression coefficients and auxiliary statistics, 


respectively. By mathematical derivations, we have 


BY = 6; – 046; (i = 1,9,5) (1425) 
44 


gf = gj = ЧА Gaza 1,2,5) (1420) 


КД 
= 81—088; BY = в p 
9 944 
= 1500 = 3113 
Be = B — FB, 
4 
= 4081 
gi 014054 
04 = du — 54 = 1.0025; gin = gh, — 2494 = 0262 
944 944 
J ۳ 
gis = gis — ч = —0085; ри = gh — 91 = 3.0577 
44 
ғ. 1 12 
0% = gas — TS = 2.5095; git = gt, — 8 ے‎ 0 
944 944 


; е 
use of the g-statistics over the method of resolving th 
normal equations depends upon the numb i i 
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Proceeding as before, we have 
Rises = BiT + Biray + бут = .50285089 
Ryiss = .7091 
RAN = i = uus 
В = ag py = 0 
where m" = 3. Referring to F-table with nı = 3 and n; = 209, we have 
Р < .01. Therefore, we conclude that the R is significantly different 


from zero. 
In testing the significance of the standard partial regression coeffi- 


cients, we haye 
= (A= Ris. o4sg 


8^ SAN т" — 1 
1- Р 155)055 _ 5 
gor, = | ag) = 0853 
(1 — 8405 _ 
NEU = 0853 


Consequently, we have 

" n 
ТЬ: M ы , = Ê! = 3649: ton, = DS = 4.784 
18 p rmt = 3.074; lg, Sg"; E , Bs Sg", 
Referring to the t-table with 209 degrees of freedom, we find that all the 
Standard partial regression coefficients are significant. "Therefore, we 
conclude that the three independent variables Xi, X», and Xs should be 


Used in order to predict the dependent variable Y. 
Step 8. Set up the formula for the prediction of Y from Xi, X», and 
X; First we calculate the partial regression coefficients. Define: 


ы-8% (1557,8) (1427) 


Similarly, =e = @ = 1,2, 4,5) (1428) 


bY! = ga (i —1,2,5 (14.29) 


For our problem, we do not need to use b; and bj. Therefore, we have 
4 Sy _ ж EP — lf Sy s 
bY = gy à = 0544; bY = By = 0100; by = бур. = 0141 
= 15.695178, and s; = 24.231172 


Where в, = = 2.311376, se 
ду = Ы s Denoting by f the pre- 


Which are calculated from the results in Step 1. 
dicted Y-score, we have 


P= F + Ура 6-212, :::,8 (14.30) 
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Similarly, | 
“Р=ў+ Уаз @ =1,2,4,5) (1431) 


Р=Р+ Оқш (@= 1,2,5) (1432) 


Again, for our problem we simply use (14.32). Then we obtain 


P = Ў ЫХ, XD) + Ы/(Х„— Ža) + (Xs — X9 

2.3362 — .0544(15.9296) — .0166(161.9061) 
.0141(90.0235) + .0544Х, + .0166X, + .0141 X; 
0544 Х. + .0166Х, + .0141X; — 2.4873 


It is to be noted that this predicted score refers to the true mean 
Y-score of all the individuals who have the same specific scores of Xi, X» 
and X;inthe population. In other words, in the long run, the true mean 
Y-score of all the individuals who have identical scores for X;, X», and Xs 
in the population will approach f within a fiducial limit which we may 
set up. Since in our example we do not find two individuals with the 
same scores for Ху, Xs, and Xs, it is difficult to verify the accuracy of the 
predicted score. 

Step 9. Compute the standard error of an individual predicted score. 


The general formula using the auxiliary statistics (g;)'s for predicting 
the standard error of an individual predicted score is 


i 


52 1 Kum 2 - >z 
Б 
where î, j = 1, ° ° + mimis the number of independent variables; 82, the 
estimate of the population variance 02; and the s/s and вуз and the т8 
are defined as before.’ For our problem, we simply use 2,) = 1, 2, 5 and 
change m to m" = 3 and (g)'s to (g")’s. "Thus we have 


ДІ — Ra) gi ТІ " 207, 
e 1421924 922,5 1 05,» (012 

— m — 1 25 + а? + 172 
N т T si 82 2 st 5 8182 


5 = 


= V/.001670[1 + 187622 + 012422 + 005242 + 0014» 


— 00242475 — .0132x:2s] 
° The working formula can also be written as follows: 


кә АЙЧ. Клен ja m 
5; V Nemi EADE TT Ex 
i 


Zz?ZXr;* 


ij 
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Step 10. Find the fiducial limits. 
Fiducial limits (p) = Ŷ + tea (si) 


where p + q = 1 and the ¢-value is obtained by referring to the i-table 
with N — m — 1 (for our problem, N — m" — 1) degrees of freedom. 
It may be stated with a confidence coefficient of 100p per cent that the 
true mean Y-score of all the individuals who have identical Ху, X5, and 
X;-scores will lie in the range of Yr t¢q)(sz). It is customary to make 
p = .99 or .95. For our problem, N — m^" — 1 = 209. Referring to 
the t-table with 209 degrees of freedom, we find that £o: = 2.600 and 
Los = 1.972. Therefore, we have 


Fiduciallimits (.99) = F + 2.600(s;) 
Fiduciallimits (.95) = f + 1.972(s;) 


Step 11. Practical application: To find the fiducial limits for the 
true mean Y-score for the following individual values: 
Жі = 10% Ха - 163; X; = 80; Y = 2.316 
f = —2.4873 + .0544(16) + .0166(163) + .0141(80) = 2.217 
ті = .0704; же = 1.0939; 215 = — 10.0235 


3% = in |: F .1876(.0704)? + .0124(1.0939)° 


+ .0052( — 10.0235)? + .0014(.0704) (1.0939) — .0024 
(.0704)(—10.0235) — .0132(1.0939) ( — 10.0235) 


= .0530 
Fiducial limits (.99) = 2.217 + 2.600(.0530) 


Fiducial limits (.95) = 2.217 + 1.972(.0530) 


The Discriminant Function. The ordering of things into classes is a 
basic procedure of empirical science. In fact, the rigorousness of the 
basis of scientific classification is an index of the development of a field 
as a science. Statistical methods are available which can be profitably 
applied to the problem of discriminating between different populations 
and classifying them. The aspect of the problem to be discussed here 
deals with the statistical uses of multi-measurement for differentiating 
between two or more groups of individuals, things, or events. This is 
frequently a problem in economics, education, psychology, or in the 
Various fields of science. For instance, individuals upon whom several 
measurements are available are to be classified into groups with a mini- 
mum of overlapping. The traditional method is to compute the sig- 
nificance of the difference between the means of groups taking each 
ely. This method is inefficient in that it does not make 
ative amount of information for differ- 


(2.079, 2.355) 
(2.112, 2.322) 


Character separat 
Possible the evaluation of the rel 1 ? 
entiation provided by the several measurements; neither does it combine 
the information taking into account the interrelations, if they exist, 
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between the characters dealt with. From this observation, the problem 
is clearly one analogous to multiple regression; that is, a weighted sum 
of the measurements is needed as in multiple regression. The difference 
lies in the nature of the criterion which is, in the problem discussed here, 
qualitative rather than quantitative as in the case of multiple regression. 
That is, the dependent variable is a dichotomy or a multiple classifica- 
tion. The particular statistic for the solution of this problem, which is 
called the discriminant function, was developed by Fisher (Ref. 10). 
The essential property of this function, which is a linear function of the 
observations, is that it will distinguish better than any other linear 
function between the specified groups on whom common measurements 
are available. The principle upon which the discriminant function rests 


is that the linear functions of the measurements will maximize the ratio 
of the difference between th 


within classes. This type of problem is also closely related to that 


is generalization of 
called, which is a 
for testing the significance between mean values of different 
pulations under the assumptions of equal variances 
Closely related also is the statistic developed by 
and studied further by Bose and Roy (Ref. 3), 
ized form of the distribution, in statistics called the 


. H " 7 
generalized distance function, D?. By the use of D?, different multivariate 
populations can be not merely discri 


D? contributes both to the proble sti i f 


Two Groups. 


" М» observations, respec- 
tively, and make р measurements Ху, . 


2.2 X, on each individual, con- 
linear function of the measurements will 


а = Ула (2 = Р EUM ig p) (14.34) 


Let the difference between means of т: be represented by d; where 


i =1, ‘°° , pfor the P measurements. Represent the sum of squares 
or products from the Specific means within classes by S;; where 2, j = 1, 


° ° °, р. Then for any linear function, o, of the measurements, the 
difference between the means of o in the two specific groups is 


D=Vrdi G-1-..,p) (439) 
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While the variance of a within classes is proportional to 


3 


8, = 2 29. (j—1-:-,p) 0436) 


The particular function which best discriminates the two groups will 
be one for which the ratio D?/Sa is greatest, by variation of the p coeffi- 


cients, М, . . . , Xp, independently. Mathematically, we should seek 
the solution for each А: 
ә [D° 7 
e I = 14.3 
ту” э 
which reduces to 
D (282 _ 2 = 14.38 
s (25 2 Dx) =0 (14.38) 
and consequently, 
aS 8 әр 14.39 
+ 7D a зш 


where it may be noticed that S/D is a factor common to the p unknown 
Ns. Therefore, the coefficients required are proportional to the solutions 


of the normal equations: 
бим + ° ° ° + Sıp\p = dı 
522. а АРУ Cs asad (14.40) 
Sodi + ° od S» = d, 
Let us define: 
І, = Van (=1, °°°, )ص‎ (14.41) 
In (14.40) we divide the ith equation by 4/5 where i = 1, +++, p. 
hen we have the following set of normal equations: 


mS eee eee (14.42) 


d 
rola ۰ ۰ ۰ + tole = = 


We can easil solve (14.42) for L’s by Fisher’s method of auxiliary statis- 
ties, in aer decim is substituted for each of the d:/ V Sx’s in turn, while 


the others are made equal to zero as follows: 


Tubi coc ыты = icy d (14.43) 
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Let us define the means of а for these two groups: 


a= ) М (—1---,p) (14.44) 


а = Ум (i = Бу е 9) (14.45) 


when Xj; is the mean value of X; for the first group and X, is the mean 
value of X; for the second group. We wish to test the hypothesis: 


Ho:E(&) = Еа) e ts the notation for the "ditur (14.46) 
of a parameter 


is no significant difference between 
y mathematical deductions, the sums 
and "between groups" are 


that is, the hypothesis that there 
two groups for the function o. B 
of squares due to “within groups" 


"Within groups" D with n = Ni -HN,—p»-—1 (14.47) 

› NN: 7 

‘Between’ NALE TNR EE | 
etween NEN, D? with nı = p (14.48) 


Then the test of H, is given by 


paNitNe-p-1 NN, 

p Ni+N, 

If we reject the hypothesis, Н б We may conclude that the obtained values 
of №5 are the assigned weights of the measurements which best discrim- 
inate these two groups. Then the next problem arises such that if we 
have another individual to be observed by making the same measure- 
X, on him, we wish to know to which group he 

shown two methods for solving this 
sufficiently large, and (2) when N, and 
8, we assume that Уі and №, are suffi- 
erion, let us denote by ті and т the 
he second group, respectively. The 
that the individual is drawn from ті. 


D 


First we calculate: 
ei У Ў ва, = Мф... NX 
Dan 


| | Gj-1---,p (449) 
a = у у BSuX zid; = Xa +... + Np Ха» 
ES 


G,j—1,---,p) (14.50) 
U = 3 Y вуха, = MMi + +++ Азу ж. 
i j 
%)-1,:--,р) (14.51) 
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where а, Gs, Хи, Хь, 8, and d; are defined as before, X; is the value 
obtained by this individual on the 7th measurement; and U is the value 
obtained by the individual for the linear function o. Then the critical 
region for rejecting the hypothesis with the least risk of both kinds of 
error, that is, accepting the hypothesis when it is false and rejecting the 
hypothesis when it is true, is given by 


& + бо 
Ug. (14.52) 


ІМ 


Problem XIV.2. Discrimination between two groups. There were two 
classes in the College of Science, Literature and Arts in the University 
of Minnesota. One class was taking the course Physics 7, which was 
more advanced than Physies 1 taken by the other class. Three measure- 
ments were available for each individual: mathematical test score, 
American Council Examination (A.C.E.) test score, and honor-point ratio 
(Н.Р.В.). Let us denote the mathematical test score by Xi, the A.C.E. 
test score by Xo, and the H.P.R. by Хз. The calculated measures are 
summarized in Table 101. 


TABLE 101 
CALCULATED MEASURES ror Two GROUPS 


Physics 1 Physics 7 


N 111 257 
ZX, ,728 23,746 

X 87.6396 92.3969 
2X: ; 14,411 

A 31.0811 56.0739 
хх. 128.6 326.1 

‘A 1.1586 1.2689 
SX, 905,694 2,388,412 
2X2 118,846 $23,945 
SX; 200.84 534.17 
ZXiXà 307,220 1,349,410 
2Х,Х% 11,756.0 31,974.6 
>Х.Х. 4,240.8 19,122.3 

di 4.7578 

ds 24.9928 

ds 0.1103 

a 


d the computations leading to the pooled sum 
he two groups. In the line of totals, the 
entries are the sum of squares and products of the entire 368 individuals. 

In the line for groups are put down the sums of squares and products 
9f the group sums in Table 101, calculated in the manner characteristic 
9! analysis of variance and covariance. As an example, the entry for 


In Table 102 are recorde 
9f squares and products within t 
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20098970 = 's 
06% = "SA 
ЮК WM = | Ma 117711 sdnoa3 ung "X 
9692 298 НВК din "sdnozr) 
10087 T Sce ptione eris ue: 
6899670 = My 89619978 = 
OGG SLIE = BIA IA SOLL°S9T = 
179870807 = у 9208 62840 = Utt ttt вапол9 una Ж 
996728220 FRET TIE 916 ettet teet ennt sdnoi 
T'e98'e% 16//0%6 қиды” ықы ey 
9629480 = ty £91922 0 5 236800790 = 18 
20867800 = Ug Л Ug A 48078908 = FSA IIA LSP L6 = TIA. 
PLT GEG = "g £204 TPL = "US I&0T'IGP'Zpz = "g sdnoi9 шуум "Y 
9680" TOP'TT LL66' S88'££9'T 0908 TPI9'9p0'g "7 77551 sdnon 
97067 “ғ 069%9294 901#6@@ 759 emma. 
ty fry Ux 


Sdnour OA, NIHLIAL SNOILVIAS([ ануаку AHL 
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column X; in row X; of Table 102 is 


(9728)? | (28,746)? 
lll no 957 | = 3,046,614.8969 
and for column X», row Ху, 


(9728)(3450) 


(23,740) (14,411) 
m + 257 1,633,888.2977 


The differences in the third line are the sums of squares and products of 
deviations within the group. The calculation of the standard deviations 
and the correlations now proceed in the usual manner. 


As examples, е 
2 . 
sı = 24491-1031 _ 26.003942 
4/306 
22,741.7023 4 
п = (107.4817)(165.7705) | 215163 


The degrees of freedom used, 366, are those within the two groups, 
(Ni — 1) + (N: — 1). 
Calculate: 


di d; ds 
Ф . 9095683; سگ ;150767 = س‎ = .008404 
S > S х NS 


1 22 
Consequently, we obtain the following set of normal equations: 


1.000000L, + .275763L2 + .356796Ls = .009563 
9757631, + 1.000000L2 + .496589L3 = .150767 
= .008404 


3567961 + .496589L2 + 1.000000Ls 


The solutions of Lı, Ls, and Ls are carried out in Table 103. 

In Table 103 a convenient check column is often carried along, to the 
right of these computations. The first and second entries of this check 
column are found by adding all other entries in their respective rows. 
The third entry is found in two ways, thus yielding a check on the 
accuracy of the arithmetical computations. The first way consists of 
addition of all entries in the third row; the other way consists of operat- 
ing on the first entry in the check column in accordance with the directions 
given at the left. The other entries in the check column are found in a 


Similar way. | 
The values of Ёз, Ёз, and kss can be read directly’ from the last row, 


(10) in Table 103: 


ka; = — .339403; 


ج 
The k-values are used in the calculations of the L’s as noted on page 351.‏ ? 


ры = —.614720; Ба = 1.426361 
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92965900: — = x 98581100: =X 09620000 — = X 

886880" — = "T 96F961  -"T 929/10" — = VT 

198961 = ey 024719: — = at 807686: — = 1 

024719: — = “y OSGLFET = o 98129. — = 9% 

80088 — = "ty 98189]' — = "q S9089D'] = My 
SEGGAY T T9EIZE T | 0247197 — 80688’ — =| 000000 T (980104) + (6) :(01) 
79128071 00000077 | 746087" - 0064627 — =| 980102: (8) + (2) + (9) :(6) 
2861887 — 0 | 1460877 — jorsstt’ =| ӨМГ — | 861868 — :9— (1460£' —) - (F) :(8) 
9826867 — 0 0 1962908: — =| 600427 — | 160860 — | 961906" — “- (9629¢8°—) - I "bap :(2) 
98660870 I 0 0 =| 000000°T 68996F' 962906" III “bat :(9) 
9ISFIG' Z 0 | 108280:1 |6278627 — =| 12606?” 000000 1 (9066267) + (ғ) :(9) 
06897072 0 | 000000:7 5992976" — =| 861866" сс626` (8) + (ғ) :(Ф) 
696984" — 0 0 |5920427 — =| 168860 — | 400° — | сөлес — ig- (g92¢26°—) - I ‘ba :(g) 
24604276 0 T 0 =| 68696?” 000000°T 894926 II bg :(z) 
6942892 0 0 I =| 96198 6926727 000000 T I ‘ba (т) 

poy, (а) (а) (а) (9) (a) (F) dym suorjoo1iq 
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In order to obtain kı, ke, and kas: 


Substitute ks; in Eq. (5), Table 103, using column (D) in the right 
member: 

Кэл + .430971(—.339403) = —.298459 

ks = —.152186 


Substitute Kg» in Eq. (5), Table 103, using (D^) in the right member: 
Газ + .480971(—.614720) = 1.082304 
ks» = 1.847230 
Substitute ks; in Eq. (5), Table 103, using (D) in the right member: 
Кз + .430971(1.426361) = 0 
kes = —.614720 
To obtain ki, kis, and kis: 
Substitute Ks, and ks: in Eq. (1), Table 103, using (D): 
kıı + .275763(—.152186) + .356796(—.339403) = 1 
ki = 1.163065 
Substitute Ks and Бәз in Eq. (1), Table 103, using (D^): 
Ез + .275763(1.347230) + .356796(—.614720) = 0 
Біз = —.152186 
Substitute Fag and kss in Eq. (1), Table 103, using (D^): 
kis + .275763(—.614720) + .356796(1.426361) = 0 
Kis = —.339403 


It is noted that 

kg = Ён (i =j, i j = 1, 2, 3) 
This is a good check on the calculation of ki; ( # j). The check of ki 
(2 = 1, 2,3) can be carried out easily. To obtain Ёл, the sum of products 
of the last two numbers (regardless of sign) in each section in column (D) 
is found. For ks: do the same in column (D^, and so on. We have 


Каз = 1.000000 + .275703(.298459) + .237950(.339403) = 1.163065 
ks» = 1,000000(1.082304) + .430971(.614720) = 1.347230 
kas = 1.000000(1.426361) = 1.426361 


The values of Lı, Ls, and Ls are obtained by calculating the following 


equations: 
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Consequently, the values of №, №%, and Аҙ are obtained by calculating the 
following equations: 
Тл, L: , ы Із 
М Su ۷ ‘Soo’ V Sas 
All these values are shown in Table 103. 
The next step is to calculate the value of D. Аз а check we can use 
two equations: 


M = 


[EE ERE E d; 


V/8u V S22 М$зз 
D = Ха $ ЕА + Asda 
In our case: D = .028779 


This value is also the “within” sum of squares. The “between” sum of 
squares is ә 


NNa VE us 
Жеп D? = 064186 


The test of significance between two groups on the variable a is given in 
Table 104. 


TABLE 104 
ANALYSIS OF VARIANCE OF а BETWEEN AND WITHIN GROUPS 


Source of variation DF. | ss. MS. Hypothesis 

Within groups 364 | .028779 | .00007906 | ....... 

Between groups 3 | -064186 | .021395 | 270.617 | Rejected 
Total 367 


.092965 


Referring to the F-table with nı = 3 and n, = 364, we find p « .01. 


Therefore, we reject the hypothesis of homogeneous groups; and the 
relative value of the variable o for discriminating between groups 18 
apparently indicated by the weights of the different, measurements: 

M = ~.00002950; л. = 00118535; л, = —.00639576 


" Now Suppose an individual is given these same measurements and 
obtains 


Xı = 80; X, = 40; 
We wish to know to which 
First, we calculate 


Хз = 1.5 
group this individual should be assigned. 


аз = —.00002950(87.6396) + .00118535(31.0811) — .00639576 (1.1586) 
= (026846 

2а = —.00002950(92.3969) + .00118585(56.0730) — .00630576(1,2080) 

" = 055626 


— 00002950(80) + .00118535(40) — .00639576(1.5) = .035460 
i + аз 5 


2— = .041236 
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It is evident that U< ata 
Therefore, we may conclude that this individual should be assigned to the 
class Physics 1. 

PROBLEMS 


1. What methods of multivariate analysis other than those reported in 
this chapter are available? Which of these are applicable to problems 
in the field of your interest? [In this connection, see Tintner, Ger- 
hard, “Some Applications of Multivariate Analyses to Economic 
Data,” Journal of the American Statistical Association, Vol. 41, pp. 
472-500 (December, 1946.)] 

2. Specify the problem of factor analysis in psychology as & special appli- 
cation of the theory of regression. [See Holzinger, K. J., and Harmon, 
H. H., Factor Analysis (University of Chicago Press, 1941); Thompson, 
G. H., The Factorial Analysis of Human Ability: (Houghton Mifflin 
Company, 2d. ed., 1946); Thurstone, L. L., M ultiple-Factor Analysis: 
A Development. and Expansion of the Vectors of Mind (University of 
Chicago Press, 1947.)] 

3. The following data for 
a study dealing with the prediction of a 
particular college of the University of Minnesota: 


a random sample of 50 students were taken from 
chievement of freshmen in a 


io at the end of the fall quarter, 
Y, = honor-point ratio at the end of the freshman year 
X, = score on Johnson Science Application Test 


X, = score on an English test 
Хз = score on the Cooperative Algebra Test 
= percentile rank in high-school graduation class transformed 


to probits 


Y; = honor-point rat 


м 3 
I 


In this problem you are to do the following: 


(a) Set up the multiple regression equation for predicting either Yı 


or Ys from Xs, Xs Хз and X4. 
(b) Test the significance of the | 
(1) Standard partial regression coefficients (the betas). 
(2) Multiple correlation coefficient. 
(3) Differences between the respective betas. С | 
(с) Set up a new multiple regression equation eliminating the inde- 


pendent variable or variables that are not statistically significant. 


(d) Repeat (b). 

(e) Calculate the standard error of 
the confidence interval, with a соп 
for Students 8, 25, 43, and 47. 


the predicted score and set up 
fidence coefficient of 98 per cent 
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a Xa 
Student| Нопог- Honor- | Joh: Соо] x: i ОГ 
No. point ratio | point ratio anos English Algebra H DA mits 
jJ | fw | G | d) | ® ve mq uns 
1 1.65 1.57 56 
2 1.29 1.38 34 is 48 564 
3 88 1:15 32 94 75 192 
" 1.29 11 55 47 32 648 
5 94 54 37 126 59 8:08 
6 80 83 32 
7 46 50 33 i5 И 5:95 
8 1 90 85 62 148 58 0:08 
10 ES 5% a 19 16 4 2 
11 13 64 ; 
12 20 39 2 % 1 197 
13 44 170 33 106 16 148 
n 1:00 1:49 44 80 31 29 
1 “60 41 84 05 
28 5.05 
16 1.27 1.67 
17 1.06 1.65 28 109 55 5:10 
18 71 184 50 23 133 
19 = 07 = 94 31 93 20 34i 
20 1.65 1.43 43 74 32 881 
E 
21 1.59 5 
21 EZ. 50 59 87 58 4.87 
2 12 43 38 95 14 4.01 
24 12 41 27 108 5 27% 
25 00 41 38 71 23 540 
5.0 
26 1.12 1 
25 12 42 40 122 12 5.39 
28 1.00 1.12 46 33 26 4% 
29 1.31 .98 55 it 5 5.98 
30 1.56 1:14 52 86 15 ix 
31 1.71 1.08 46 " 
32 13 183 48 in 23 5-% 
33 53 1.06 59 105 e 5.81 
12 .60 25 9 n 55 
85 29 .69 42 2 i 2% 
72 0 5.67 
36 15 .58 39 115 
9 17 5 20 
37 09 17 a 116 40 175 
39 09 7 37 116 2 175 
40 1.41 .84 62 89 a 2% 
4. 
a 24 11 24 58 
di 13 5.00 
43 2.00 1.56 2 "d 17 FU 
44 07 18 45 12 i б 
45 1.00 1.30 52 86 % 55 
46 1.57 1.67 65 н 
47 — .62 = .77 31 p! 5 5-03 
47 = a 31 98 36 2.67 
B 1 p 1.33 51 181 E i 
А 0.00 30 11 
75 21 4.12 
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4. The following statistics were derived from data collected in a study 
dealing with the relation between instruction in a course in college 
biology and the students’ belief in the efficacy of certain commercial 
preparations and home remedies. The criterion was the score on the 
test, Y. The independent variates, Xi, X», Xs, X4, and X; are 
specified below. In this problem: 


(a) Set up the multiple regression equation for estimating Y from X,, 
Xs, Xs, X4, and Xs. 

(b) Test the significance of the 
(1) Standard partial regression coefficients. 

(2) Multiple correlation coefficient. 

(с) Of the variance of the dependent variate Y accounted for by the 
combined effect of the independent variates, calculate the pro- 
portion assignable to each of the independent variates. (See 
Johnson, Palmer O., “Тһе Differential Function of Examinations,” 
Journal of Educational Research, Vol. 30 (1936), pp. 93-103.) 

(d) Test the significance of the difference between the two largest 
partial regression coefficients. 

(e) Find the 5 per cent fiducial limits for the largest partial regression 


coefficient. 

(f) Calculate the partial correlation coefficient, туха. х, and test its 
significance. 

Zero order correlations: N = 223 

Tig = 452 


303 Tas = .638 

.924 Ta = 274 ты = .171 

Tis = .147 тэу = .826 тзу = .190 Tis = .189 

.514 Toy = .621 Тау = .542 Тау = .197 Ts, = .184 


33 
= Б 
ІСІ 


ry = 

5 = 6.50 X,- 22.52 Where Y = score on application in 
hygiene 

So = 23.78 Ха = 80.54 Хі = score оп test of facts and 
principles in hygiene 

s3 = 4.20 X; = 23.4 Xs = score on vocabulary test 
in hygiene 

84 = 0.956 X,= 4.95 Xs = score on final examina- 
tion in hygiene 

ss = 1.00 Ў = 5.60 X, = transformed high-school 
percentile ranks 

$, = 4.89 Y = 32.08 Ху = transformed College Apti- 


tude Test percentileranks 


5. The following data were collected on two groups of students in an 
experimental investigation of the relative efficacy of two different 
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methods in teaching agricultural chemistry at the high-school level 
Compare the two groups with respect to the set of multiple measure- 
ments made at the beginning of the experiment. 


Topical Assignment Group Discussion Group 
Pupil| X, Xs Ха X, Pupil| X, X: X; X; 
1 110 | 1.37 | 38 27 T 103 | 1.50] 26 15 
2 81. | 1.62] 18 3 2 115 | 1.74 | 28 12 
3 111| 2.49 | 26 10 3 104 | 1.36 | 25 7 
4 | 110 [1.96 | 20 Т 4 85 | 0.25 | 20 3 
5 95 | 0.86 | 15 8 5 84 | 0.53 | 17 5 
6 85 | 0.56 | 14 10 6 87 | 0.25 | 23 6 
7 97 |1.38| 25 9 yi 93 | 2.00 | 34 6 
8 90 | 0.25 13 3 8 119 | 1.94 24 4 
9 85 | 0.51 | 21 11 9 123 |2.64| 44 16 
10 83 | 0.78 | 21 5 10 106 | 0.75 | 20 10 
11 83 | 1.15] 22 7 11 99 | 2.11 | 24 13 
12 | 100|2.24 | 31 15 12 80|0.45| 22 7 
13 106 | 0.72 | 22 3 13 112| 1.96 | 40 16 
14 92 | 1.36 | 20 6 14 91 | 1.19 | 17 5 
15 94 | 1.25 | 16 1 15 77 | 0.42 |. 14 6 
16 16 96 | 1.08 | 20 1 
17 17 85 | 0.90 | 15 2 
18 18 115 | 1.65 | 22 7 
E 19 117 | 1.75 | 26 11 
ا‎ к i, 


X; = Intelligence quotient based on Kuhlman-Anderson Tests. 

Honor-point ratio of previous year’s work. 

Хз = Score on pretest of knowledge of facts and principles examina- 
tion administered at the beginning of the term. 

Ха = Score on pretest of Glenn-Welton Chemistry Achievement Test 
administered at the beginning of the year. 


B 
ll 
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APPENDIX 


TABLE I* 
Proportion or THE CASES IN A NORMAL DISTRIBUTION LYING BELOW CERTAIN 
VALUES OF ТПЕ ABSCISSA 


Abscissa 


Abscissa Proportion Abscissa Proportion Proportion 
Х- М. of cases X-M _ of cases |X—M of canes 
S S below z S 7 below: Noc below z 
.00 .5000 1.25 .8944 2.50 .9938 
.05 .5199 1.30 .9032 2.55 .9946 
.10 .5398 1.35 .9115 2.60 .9953 
.15 .5596 1.40 .9192 2.65 .9960 
.20 .5798 1.45 .9265 2.70 .9965 
.25 .5987 1.50 .9332 2.75 .9970 
.30 .6179 1.55 .9394 2.80 .9974 
.85 .6368 1.60 .9452 2.85 .9978 
.40 .6554 1.65 .9505 2.90 .9981 
.45 .6736 1.70 .9554 2.95 .9984 
.50 .6915 1,48 .9599 3.00 .9987 
.55 .7088 1.80 .9641 3.05 .9989 
.60 .7257 1.85 .9678 3.10 .9990 
.65 .7422 1.90 .9713 3.15 .9992 
.70 .7580 1.95 .9744 3.20 .9993 
‚75 .7734 2.00 .9772 8.25 .9994 
.80 . 7881 2.05 .9798 3.30 .9995 
.85 .8023 2.10 .9821 3.35 .9996 
.90 .8159 2.15 .9842 3.40 .9997 
.95 .8289 2.20 .9861 3.45 .9997 
1.00 .8413 2.25 .9878 3.50 .9998 
1.05 .8531 2.30 .9893 3.55 .9998 
1.10 .8643 2.35 .9906 3.60 .9998 
1.15 .8749 2.40 .9918 3.65 .9999 
1.20 .8849 2.45 .9929 3.70 .9999 


* Table arranged by Dr. Robert, W. B. Jackson and used with his permission. For the extended table 
1 curvo the reader is referred to the tables given by Karl Pearson in Tables 


and for other tables of the normal м А > d 
for Statisticians and Biometricians, Part I, issued by the Biometric Laboratory, University College, 


London. 
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TABLE II* 
DISTRIBUTION Or | 
Probability 

n 9 A 7 % .5 E! 3 2 E .05 .02 01 001 
1 .158 .325 .510 .727 1.000 1.370 1.963 3.078 6.214 12.706 31.821 63.657 636.619 
2 132 .280 445 617 .816 — 1:061 1.386 1:975 2.920 1.303 0.965 9.025 31.598 
3 .137 .277 .424 .584 765 -978 1.250 1.638 2.353 3.182 4.541 5 12.941 
4 2134 .271 .414 .569 741 -941 1.190 1.533 2.132 2.776 4. 8.610 
5  .182 .207 .408 .559 тот -920 1.156 1.476 2.015 2.571 3.365 4 6.859 
6 .131 .265 .404 .553 .718 -906 1.134 1.440 1.943 2. 3.143 3.707 5.959 
7 .180 .263 .402 [549 0711 -896 1.119 1.415 1.805 2. 2.998 3.499 5.405 
8 .130 .262 .399 .546 706 -889 1.108 1.397 1 800 2.3 2.800 3.355 5.041 

9 .129 .201 .398 .543 703 -883 1.100 1.383 1.823 2. 2.821 3.250 4.781 
10 .129 .260 .397 542 ‘700 -879 1.093 1.372 1812 2. 2.704 3.100 4.587 
11 .129 .260 .396 .540 .697 -876 1.088 1.363 1.796 2.201 2.718 3.106 4.437 
12 .128 .259 .395 1539 605 :873 1.083 1.356 1.782 2.179 2.081 3.055 3.318 
18 .128 .259 .304 538 бо -870 1.079 1.350 1.771 2.160 2.050 3.012 4.221 
14 .128 .258 .393 .537 “602 -868 1.076 1.315 1.761 2.145 2.624 2.977 4.140 
15 .128 .258 .393 |536 691 -866 1.074 1.341 1/753 2.131 2.002 2.947 4.073 
16 -535 .090 :805 1.071 1.337 1.746 2.120 2.583 2.921 4.015 
17 .534 1689 -863 1.069 1.333 1.730 2.110 2.507 2.898 3.905 
18 .534 .088 -862 1.067 1.330 1.734 2.101 2.552 2.878 3.922 
19 .533 .688 -861 1.066 1.328 1.729 2.003 2.539 2.861 3.883 
20 .533 .087 :860 1.064 1.325 1/725 2.086 2.528 2.845 3.850 
21 .532 .086 :859 1.063 1.323 1.791 2.080 2.518 2.831 3.819 
22 582 -080 — .858 1:001 1.321 17711 2074 3.508 2.819 3.792 
23 .532 .685 -858 1.060 1.319 1/714 2.009 2.500 2.807 3.707 
24 1081 7085 — .897 1.059 1,318 1.11 2.004 2.492 2.707 3.745 
25 -531 1684 -856 1.058 1.316 1.708 2.000 2.485 2.787 83.725 
26 .127 .256 .390 .531 684 -856 1.058 1.315 1.70 2.779 3.707 
27 .127 .256 .389 531 684 -855 1.057 1/314 1:708 2/05 2 2/771 3.690 
28 .127 .256 .389 530 без -855 1.056 1.313 1.701 2.018 2 2.703 3.674 
29 .127 .256 .380 1530 083 -854 1.055 1.311 1.699 2:048 2. 2.750 3.650 
30 .127 .256 .389 1530 “683 804 1.055 1.310 1.697 — 2:045 2.457 2.750 3.040 
40 .126 .255 .388 .529 gg) -851 1.050 1.303 4 2% 

60 .126 .254 1387 ‘597 “670 848 1:046 1.200 1 054 2021 pan А 3. 

7 296 1.671 2.000 2.390 2. 

120 .126 .254 .386 [526 “677 +815 1.041 1.289 1.658 1 2 H 

^  .120 .253 .385 .524 “674 {812 1.036 1.282 1/645 1 2 S 

* Table II is reprinted from Table їп, i 
Biological, Medical and atten of t 


^ s Fates, Statistical Tables for 
Agricultural Research, Oliver & Boyd. ji] rind Yates, Statistical Ta 


authors and publishers, d., Edinburgh, by permission of the 
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12 |.124|.387|.545|.641/.704|. 749 -782|.807|.828/.857.887|.910].944|.973/1.000 
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2 |.016].141|.284|.398/.485|.551|.603|.645|.678/. 730| . 783|.836|.890|.945 
3 - 162). 314|. 429|. 514| . 578| . 628|. 667| . 699| . 748| . 798| . 848| . 898| . 949 
4 . 188|. 345| . 459| . 542). 604| . 652| . 689| . 719]. 765|. 812|. 859| . 906| . 953 
5 . 210|. 370| . 484 . 565/ . 624| . 670| . 706| . 735| . 779|. 823| .867|.911|.956 
6 «230|. 391|. 504|. 583| . 641|. 685|. 720| . 748|. 789| . 832| .874| .916|.958 
7 . 246]. 409| 520}. 597|. 654| . 697]. 730| . 757]. 798| . 839| . 879| . 920| . 960 
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18 |.093|.342|. 504|. 605|. 672|. 721|. 756]. 784|. 807|. 840| .873| . 905| . 937|. 969 
20 |.100|.352|. 512|. 613|. 679|. 727|.761|.788|.811|. 844|. 876|. 939|. 

22 |. 105}. 360|. 520|. 619|. 684|. 732|. 765|. 792|. 814|. 847|. 878|. 909| . 940| . 970 
24 |.110|.367|.526|. 624|. 688|. 736|. 768|. 795|.817|. 850|. 880|.911|.941|.971 
26 |.115|.373|. 532|. 629|. 693|. 740| . 772|. 798| . 820|. 852| . 882|. 912| . 942|. 971 
28 |.119|.379|. 537|. 634|. 697|. 744|. 776|. 802| .823|. 854| .884| . 914| . 943|. 972 
30 |. 123|.386|. 543|. 639|. 703|. 748]. 781|. 806|. 827|. 856|. 886|. 915|. 944|. 972 
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Note: For n = 2, k = 50 and L.o = .151. 


* These tables are reproduced, by permission, from Statistical Research Memoirs, Vol. I, edited by J. 
Neyman and E. S. Pearson and issued by the Department of Statistics, University of London, University 
Colleze, London. 
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Agreement, coefficient of, 177 
calculation of, 178 
significance of, 178-179 
Aitken, A. C., 167 
Alexander, H. W., 147, 325 
Amount of information, see Information 
Analysis of vaiiance: 
application of, to testing: 
differential educational development 
by grades, 246-252 
homogeneity of multiple groups of 
measurements, 231-234 
independence of mental ages of 
twins, 226-230 
linearity of regression of final on 
initial scores, 241-246 
nssumptions underlying, 164, 212, 218, 
219, 226 
compared with traditional biometric 
method, 216 
division of degrees of freedom in, 214, 


215 

division of sums of squares in, 215, 216, 
220 

experimental and sampling designs 


dependent on, 210 
F-test, or z-test in, 54, 214 
interaction in, 222, 224, 265 
k-way classification, 224 
the solution tor the sum of squares, 
224, 225 
one-way classification, 219 
maximum likelihood solution of, 219 
hypothesis tested, 220 
randomization in, 164 
two-way classification, 221 
maximum likelihood solution of, 221 
hypothesis tested, 222, 223 
unequal representation in the sub- 
classes in, 260-261 : 
Analysis of variance and covariance: 
application to testing differential edu- 
eational development by grade, 
252-260 a 
complete procedure for analysis with 
one independent variable, 252-255 
complete procedure for analysis with 
two independent variables, 256— 


260 


Analysis of variance and covariance 
(cont.): 
application to testing equality of grade 
means and school means on a 
speed of reading test (approxi- 
mate method for unequal fre- 
quencies in subclasses of two 
classifications), 261-265 
application to testing identical twin 
achievement when inequality in 
mental age is eliminated, 235-240 
Analysis of covariance: 
assumptions underlying, 235 
principles of, 216, 235 
process of, 216 
purpose of, 216, 311 
Analysis of variation: 
application of, 211-216 
assignable causes of, 210 
chance causes of, 210 
fundamental problem in, 211 
hypothesis tested in, 213 
role of statistics in, 212 
test of significance in, 213, 214, 216 
Ancillary estimation, 107 
Anderson, R. L., 325 
Arbitrary corrections, 277 
Arithmetic mean, see Mean, arithmetic 
Assumptions, testing of, 17, 31 
in analysis of variance, 212, 226, 280 
in equivalent-form method, 127, 128 
in experimental design, 284 
in ranking, 166, 169, 170 
in sampling, 199 
in split-test method, 127 
underlying most statistical methods, 
155 
underlying product-moment coefficient 
of correlation, 241 
Attitudes, measuring intensity of, 183 


B 


Bacon, Sir Francis, 62 

Bartlett, M. S., 102, 167, 356 
x? for multiple classification, 94 
testing homogeneity of variances, 83 

Baxter, Brent, 325 

Bayes’s postulate, 25 

Bayes’s theorem, 24 

Beall, Geoffrey, 356 


369 


370 


Behrens-Fisher test, 73 
Behrens, W. U., 102 
Bernouilli, theorem of, 20 
Beta coefficient: 
measure of kurtosis, 149, 151 
measure of skewness, 149, 151 
sampling distribution of, 151 
Beta function, incomplete, 118 
Bias in: 
estimation, 39, 41, 281 
sampling, 188, 197 
statistical tests, 167 
Binomial distribution, 25, 26 
limiting form of, 27 
moments of, 56, 57, 58 
Biserial phi-coefficient, 146 
Biserial r, 146 
Bishop, D. J., 102 
Bliss, C. 1., 168, 356 
probit in testing normality, 160 
transforming ranks, 166 
Boltzmann, L., 5 
Bose, R. C. distribution 
356 
Brandt, A. E., 102 
Brandt and Snedecor, method of caleulat- 
ing x?, 94 


of Рз, 344, 
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Carlson, W. S., 325 
Census of population, 185 
Central Limit Theorem, 27 
Chapin, F. Stuart, 326 
Chi-square, see x? 
x*-distribution: 
M independent tests by, 170- 


correction for continuity in, 94 
curve of, 42 
inr хс contingency tables, 94 
in 2 X 2 tables, 91, 93 
in testing: 
agreement of observation and hy- 
pothesis, 96 
goodness of fit, 37, 39 
T Бі o7 , 42, 45, 48, 51, 
homogeneity of frequency distribu- 
tions, 94 
hypotheses, 91, 96 
normality, 149 
principle of classification, 91 
properties of, for estimation, 116 
separating individual degrees of free- 
dom in, 95 
Circular triads, in preferences, 177 
Classification, 343 
Cochran-Cox, test for equality of means, 
74-75 
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Cochran, W. G., 208, 225, 356 
on correction for continuity, 94, 102 
on log transformation, 163, 168 
on subsampling, 192 
Cohen, J. B., 208 
Collar, A. R., 147 
Complex experiments, 276, 296 
Concordance, coefficient of, 174, 
(see also m-rankings) 
testing significance of, 175 
Confidence coefficient, 111 
Confidence interval: 
compared with fiducial limits, 112 
compared with tolerance limits, 123 
for coefficient of correlation, 141 
for difference in percentages: 
on different samples, 120 
on same sample, 119 
for individual’s true score, 117 
for mean, population variance known, 
114 


for median, 117 
for one parameter, 111 
for several parameters, 112 
principle of selection, 111 
theory of, 110-112 
Confidence region, 112 
Consistence, coefficient of, 176 
significance of, 177 
of choices, 180 
Consistent, see Estimation 
Control(s): 
in experimentations, 278 
in purposive sampling, 191 
Cornell, F. G., 143, 208 
Correlation, coefficient of product-mo- 
ment: 
combining estimates of, 53 
confidence interval for, 141 
Fisher's transformation of, 52 
maximum-likelihood estimate of, 123- 
125 
sampling distribution of, 48-52 
Standard error of, 51 
tables of (David), 141 
testing assumptions underlying, 241— 
243 
testing significance of, on different 
samples, 53, 86 
testing significance of, on same sample, 
54, 87 
Correlation coefficient, multiple, 328 
equation of, 329 
testing significance of, 338, 339, 341 
Correlation coefficient, partial, 355 
Correlation intra-class, 230 А 
Correlation, rank, Spearman’s coefficient 
of, 169 
as a test of significance, 170 


INDEX 


Correlation,rank (cont.): 
testing significance of, 170 
Correlation ratio, 147 
for ranked data, 173 
Cowden, D. J., 208 
Cox, G. M., 275, 357 
Craig, A. T., 208 
Criteria: 
of normality, 149, 153 
of optimum estimates, 105 
Critical region, see Statistical hypotheses 
Crump, S. L., 275 
Curtiss, J. H., 164, 168 


D 


D?-statistie, 344 
Darwin, Charles R. and Law of Large 
Numbers, 4 
Day, B. B., 357 
Degrees of freedom: 
geometric interpretation of, 139, 140 
physical interpretation of, 139 
statistical inte: pretation of, 140, 141 
Deming, W. E., 208, 209 
on errors in sampling surveys, 197 
Design of experiments: 
modern ideas of, 276 
nature of experimental observations, 
278 
orthogonality in, 295, 308 
randomization in, 282, 286 
test of significance dependent on, 
282 
validity of method of least squares 
dependent on, 284 
relation of statistical analysis to, 282 
necessity of exact tests and analysis 
of variance, 283 
replieation, function of, in, 280, 286 
role of statistician in, 285 
self-contained property for, 277 
function of controls, 278 
valid estimate of experimental errors, 
280 
Design of sampling inquiries, 186, 192 
a comparative experiment on sampling 
methods, illustrative of, 202-207 
(See also Sampling) 
method of selecting sample, 203 
stratification proportionate to num- 
bers, 204 
stratification proportionate to prod- 
uct of numbers and standard 
deviation, 205 Т 
stratification with по restriction, 
203 
statistical aspects in, 193, 199 
Dice, throws with, 21 
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Difference of two correlation coefficients: 
sampling distribution of, 53 
test of: 
on different samples, 53, 86 
on same sample, 54, 87 
Difference of two means: 
sampling distribution of: 
with known population variance, 
37 
e unknown population variance, 
4 
test of: 
of correlated measures, 75, 76 
of equal variances, 73-75 
of unequal variances, 80 (see also 
Behrens-Fisher test and Cochran- 
Cox test) 
Difference of two percentages: 
sampling distribution of, 58-59, 165 
test of, 80-81, 120 
Difference of two regression coefficients, 
test of, 90 
Difference of two variances (or standard 
deviations): 
sampling distribution of, 54, 55 
test of, 81, 82 
Differences among set of variances: 
sampling distribution of, 83, 84 
test of: 
Bartlett’s test, 83, 84, 85 
Hartley's method, 84, 85 
L;-test, 83, 86 
Direct probability, see Probability 
Discriminant function, see Multivariate 
analysis 
Disproportionate class numbers, see 
‘Analysis of variance, unequal rep- 
resentation 
Distribution, curves, 28 
problems of, 104 
Distributions: 
binomial, 25 
polynomial, 26 
simultaneous probability, 124; (see also 
binomial, Poisson, and normal, 25, 
26, 27) 
theoretical, 22 
Doolittle method, see Normal equations 
Duncan, W. J., 147 
Dwyer, P. S., 357 


E 
Eden, T., 225 
Efficiency (see also Estimation): 
of pairing, 80 


of sampling, 192 
Eisenhart, C., 68, 225 
Engelhart, M. D., 326 
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Errors (see also Sampling): 
experimental, 280 
of bias, 188 
of first and second kind, 64 
theory of, 27 
Estimates: 
best linear, 193 
large sample, 34 
of a ranking, 175 
optimum, 105 
unbiased, 41 
Estimation: 
bias in, 39, 41, 281 
consistency in, 105 
efficiency in, 105 
interval, 104, 109, 111; (see also Con- 
fidence and Fiducial) 
point, 104 
limitations of, 108 
problem of, 15, 104, 193 
analysis of variance in, 275 
sufficiency, 105 
Estimation, method of (see also maximum 
likelihood): 
by minimum x?, 107, 108 
by minimum variance, 107 
by moments, 107 
by principle of unbiased estimates, 108 
Expectation, mathematical, 22, 39, 105, 
193, 228, 346 


F 


Factor analysis (psychology), 353 
Factorial design, see Principles of experi- 
mentation 
F-distribution (variance-ratio), 55 
Ferguson, G. A., 162, 168 
Fiducial inference, theory of, 109-110 
Fiducial limits, 109 
compared with confidence 
112 
of an individual's score, 343 
of the mean, population variance un- 
known, 114-115 
of the variance, 115-117 
Fiducial probability, 109 
Finney, D. J., 142, 162 
Fisher, R. A., 102, 147, 168, 208, 225, 275, 
285, 326, 357 
analysis of covariance, 216 
analysis of variance, 210 
applieations of Student's distribution, 
61 
design of experiments, 277, 278, 281 
discriminant function, 344 
distribution of x? when parameter 
estimated from data, 108 
fiducial inference, 109 


interval, 
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Fisher R. A. (cont.): 
k-statisties, 153 
measurement of information, 105 | 
measures of departure from normality, 
153, 155, 156-157 
table of ¢, 360 
tables of x?, 361 
2z-distribution, 54-55 
Forecasting, 3 . 
Four-fold point surface, correlation of, 
146 
Frazer, R. A., 147 
Freedom, degrees of, see Degrees of 
freedom 
Freeman, F. N., 275 
Frequency theory of probability, see 
Probability 
Friedman, M., 172, 183 


G 


9ı, а measure of skewness, 153, 154, 161 

92, а measure of kurtosis, 154, 161 

Gaddum, J. H., 160, 163, 164, 168 

Galton, Sir Francis, 7 

Gamma function, 44 

Garrett, H. E., 326 

Gaussian error curve, see Normal curve 

Gauss, K. F., 28 а 

Generalized distance of Mahalanobis, 34+ 

Gibbs, Willard, 5 

Goodness of fit, test of, 63 

Goulden, C. H., 326 ы f 

Greco-Latin square, see Principles © 
experimentation 

Guttman, L., 183 


H 


Handy, L. M., 102 

Hansen, M. H., 208, 209 

Harmon, H. H., 353 

Hartley, H. O., 102 

Heterogeneity, condition of, 211 

Holzinger, K. J., 275, 353 

Homogeneity, condition of, 211 . i 

Homogeneity of variance, assumption ol; 
211 

Horst, P., 357 

Hotelling, H., 11, 61, 183, 326, 344, 357 

Hotelling’s Т, see Multivariate analysis 

Houseman, E. E., 325 

Hoyt, C. J., 134, 148 

Hsu, P. L., 102 

Hudelson, Earl, 21 

Hurwitz, W. N., 208 

Hypothesis, role of, in science, 62 М 

Hypothesis, testing of, see Statis 
hypothesis 


tical 


INDEX 


I 
Inference, 12, 14, see Statistical hypothe- 
sis 
Information: 


in small samples, 105 
invariance as measure of, 105 
relevant, 104 
Intelligence, distribution of, 160 
Interaction, see Analysis of variance 
Intra-class correlation, see Correlation 
Inverse probebility, 109 
Item analysis, statistics used in, 147 


J 


Jackson, R. W. B., 148, 275, 357 
test of sensitivity, 129 

Jessen, R. J., 208 

Johnson-Neyman technique, 275 

Johnson, P. O., 168, 275, 326, 355 


K 


k-statisties: 
definition, 154 
general properties, 153 
sampling cumulants of, 154 
Kelley, T. L., 146 
Kendall, M. G., 16, 30, 183, 208, 209 
multiple rankings, 174, 175 
paired comparisons, 176 
randomness, 187 
random sampling, 200 
random sampling numbers, 201 
Kermack, W. O., 208 
Kinsey, A. C., 207 
Kollektiv of von Mises, 19, 25 
Kolmogoroff, A., 30 
probability as abstract ensembles, 20 
Kuder, G. F., 148 


L 


L-tests, 83, 128 2. 
Lagrange’s undetermined multipliers, 219, 
221, 223, 227 
Laplace, 25, 28 
Laplacian-Gaussian error curve, see Nor- 
mal curve A 
Latin square, see Principles of experimen- 
tation 
Law: 
binomial, 26 
of chance, 31 
of error, 28 
of large numbers, 27 
of nature as statistical regularity, 4 
of single variable, 277 
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Law (cont.): 
of small numbers, 27 
second law of thermodynamies as a 
statistical, 5 
Least squares, principle of, 28, 176, 284, 
328 
Lehmer, E., 68 
Lew, E. A., 209 
Likelihood-ratio tests of an hypothesis, 
66, 67, 68; (See L-tests) 
Lindquist, E. F., 209, 326 
Linear function, 27, 28 
Linear hypothesis, testing, 244 
Linearity of regression, see Regression 
Linear scale, 155 
Location, estimation of parameters of, 
193 


M 


m-rankings, the problem of, 174 
McCall, W. A., 168 
MeKendrick, A. G., 208 
MacKenzie, W. A., 225 
MeNemar, Quinn, 209 
Madhva, K. B., 209 
Madow, L., 209 
Madow, W. G., 209 
Mahalanobis, P. C., 61, 191, 194, 209, 
344, 357 
Markoff, A., 108, 194 
Marks, E. S., 207 
Martin, W., 357 
Maung, K., 357 
Maximum likelihood: 
estimate, 106 
consistency of, 106 
efficiency of, 106 
function of parameters, 106 
sufficiency of, 106 
Maximum likelihood method in estimat- 
ing parameters of a normal 
correlation surface, 123 
Maximum likelihood, principle of, 25 
Mean, arithmetic: 
distribution of, in normal samples, 33- 
37 
interval estimation of: 
from population of known variance, 
112 
from population of unknown vari- 
ance, 114 
significance of: 
from a known normal population, 69 
from an unknown normal popula- 
tion, 71 
from a small finite population, 71 
regressed mean, 91 
sufficient estimate, 106 
Mean square contingency, 146 
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Measurement: 
of exceptionalness, 57 
of mental qualities, 158, 159 
role of statistics in, 5-6 
Median: 
confidence interval for, 117 
probability of, 110 
Merrington, M., 61, 103 
Minimum, absolute, 228, 233 
Minimum, relative, 229, 233 
Mises, R. von: 
Kollektiv, 25, 30 
probability as limit in frequencies, 19 
Moments: 
binomial, 56 
corrections in, for grouping, 150, 153 
definition of, 150 
efficiency of, 107 
method of, in fitting normal curve, 
149 
of normal distribution, 155 
Muhsam, H. V., 168 
Multiple correlation, see Correlation 
Multivariate analysis: 
discriminant function, 343 
illustration of, in a two-group prob- 
lem, 347-353 
mathematical derivation of, 344-345 
relation to theory of regression, 344 
generalized distance function, D2, 344 
Hotelling’s Т, 344 
significance of set of means, 343 
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Nagel, Ernest, 30 
Nair, K. R., 148 
Nair’s Tables for confidence intervals of 
median, 119 
Nair, U. S., 102 
Nayer, P. P. N., 102 
Neyman, J., 68, 102, 148, 209, 275 
confidence intervals, 111 
representative method, 194 
sampling from finite population, 191 
testing hypotheses, 64, 66 
theory of estimation, 110 
Newman, Freeman, and Holzinger, 226 
Newman, H. H., 275 
Newton, Sir Isaac, 62 
Noble, Sister Mary Alfred, 357 
Nonnormal data: 
statistical analysis of, 169 
by analysis of variation, 172 
by coefficient of concordance, 174 
by method of paired comparison, 176 
by rank correlation, 170 
validity of z- or F-test, and t-test ap- 
plied to, 167 


INDEX 


Normal curve: 
Gaussian error, 27 
Laplacian-Gaussian error, 27 
reproductive property of, 27 
table of areas, 359 
Normal distribution, 27 
as limit of binomial, 27, 58 
Normal equations, 328, 329 
Doolittle method of solution of, 330, 333 
use of auxiliary quantities in, 330, 331 
Normality: 
of human traits, 160 
mathematical conditions for, 149 
tests of, 149, 153, 155-157, 160 
Normalization of frequency-function, see 
Transformation 
Normal probability curve, 149 
as a close approximation, 155 
Norton, H. W., 94, 102 
Null hypothesis, 63, 64, 217 


о 


Ockham, 62 

Olds, E. G., 183 

Ordinal number, 200 Р 

Orthogonality, see Principles of ехрегі- 
mentation 


P 


Parameter, 33 
degrees of freedom, as a, 46 
estimation of, see Estimation 
represented by a Greek letter, 33 
Partial correlation, 355 
Partial regression coefficients, 324 Е 
Partial regression equation, see Multiple 
regression , 
Pearson, E. S., 68, 102 
on testing hypotheses, 64 
principle of likelihood, 66 қ 
sampling distributions of Beta statis- 
ties, 151 
variation analysis in industry, 225 
Pearson, Karl, 30, 148, 168 a 
development of the x?-test, 108 
method of moments, 107 
rank correlation equivalent, 169 
test of normality, 149, 152 
Percentage, experimental errors of, 164, 
165 
Peterson, A. S., 102 
Pillai, К. C. S., 141 
Plesset, I., 102 
Poisson distribution, 26 
calculations for, 97 
moments of, 27, 165 
. transformations for, 165, 166 
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Population: 
as basis of statistical theory, 18, 19, Q 
187, 200 


continuous, 187 
existent, 187 
finite and infinite, 187 
hypothetical, 22 
Power of a test, see Statistical hypothesis 
Precision of an experiment, 281 
Prediction, 327; (see also Principles of 
experimentation) 
in science, 62 
of elections, 190 
Principles of experimentation, applica- 
tion of (See also Design of Experi- 
ments) 
factorial design, 296 
efficiency and comprehensiveness of, 
297-298 
illustration of a 4 X 7 X 2X2 X2 
design in psychology, 298-310 
problem of prediction illustrated in: 
orthogonal polynomials for deter- 
mining regression equation, 308- 
309 
splitting up degrees of freedom into 
orthogonal components, 308- 
310 
factorial design and covariance: 
illustration of, in a 2 X 3 X 3 X 3 
design, 311-324 
Greco-Latin square, 293 
orthogonality in, 293 
Latin square, 292-295 
randomized-block arrangement, 287- 
292 
single factor experiment, 286-287 
symmetrical incomplete randomized 
block design, 292 
Probability: 
basic rules of direct, 23 
definition of, 19-20 
experiment in, 50 
fiducial, 112 
inverse, 109 
of errors of the 
64 
posterior, 24 
prior, 24, 25 
range in value of, 24 
role of, in statistics, 14, 18, 32 
statements in interpretation of, 22- 
23 
statistical distributions and, 22, 109 
Probits, 160 
use of, in testing normality, 161, 162 
Problems for solution, 29-30, 60-61, 97- 
101, 141-147, 266-275, 324-325, 


" 353-356 


first and second kind, 


Quality control, tolerance limits in, 123 
Quantum statistics, 5 
Quetelet, A., 28 


R 


Randomness, 22, 25 
as criterion of sample, 187 
definition of, 187 
tests of, 188 
Random sampling numbers, 201 
Random sampling, see Sampling 
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